About the Role
What you’ll do
Translate product goals into measurable metrics and SLOs, and build a rigorous evaluation harness to continuously score agents performance
Develop feedback and optimization pipelines that uses both automated metrics and human-in-the-loop evaluation signals to improve agent behavior over time
Implement and scale optimization techniques such as Direct Preference Optimization (DPO), Proximal Policy Optimization (PPO), and reward modeling to improve agent performance.
Launch and support fine-tuned models in production environments with robust evaluation, rollback strategies, and performance monitoring.
Collaborate closely with applied AI/ML teams to translate state-of-the-art research in agentic reasoning, planning, and tool use into reliable, production-ready systems
What you bring
Strong technical expertise in software development, with understanding of agentic workflows—including reasoning loops, tool invocation, memory, and orchestration of autonomous AI agents.
Hands-on experience using Large Language Models, including prompt engineering, fine-tuning, model distillation, and deploying optimized models (e.g. via DPO, PPO) into production environments.
Leadership and mentorship capabilities, with a track record of guiding complex technical projects and supporting the growth of teammates through code/design reviews and technical direction.
Excellent communication and collaboration skills, with the ability to translate technical ideas into actionable plans and work effectively with cross-functional partners, including product and infrastructure teams.
Innovation mindset and commitment to continuous learning and a bias toward action, staying at the forefront of ML/AI trends, agentic systems research, and best practices in tooling, safety, and evaluation.