Nuvepro - Task Intelligence for the Enterprise
Anthropic· AI Research & Engineering· San Francisco, CA | New York City, NY | Seattle, WA

Machine Learning Systems Engineer, RL Engineering

Classified Tasks (13)

Automate 0%Augment 100%Human-Only 0%

Augment (13)

AI assists, human decides

Build and maintain systems that train large AI models for production and research.

technical

Implement and improve advanced machine learning training techniques to increase model capability, reliability, and steerability.

technical

Design, develop, and operate critical algorithms and infrastructure that researchers use to train models.

technical

Improve the performance, robustness, and usability of training systems to accelerate research progress.

technical

Maintain and enhance finetuning systems (including RLHF and related methods) used to train production and internal models.

technical

Profile the reinforcement learning training pipeline to identify performance bottlenecks and opportunities for improvement.

analytical

Build a system to regularly launch training jobs in a test environment to detect pipeline problems quickly.

operational

Modify finetuning systems to support and work on new model architectures.

technical

Build instrumentation to detect and eliminate Python GIL contention and other runtime contention in training code.

technical

Diagnose causes of training runs slowing down after a number of steps and implement fixes.

analytical

Implement stable, fast versions of new training algorithms proposed by researchers.

technical

Monitor, troubleshoot, and resolve failures and degradations in large-scale training jobs.

operational

Support research teams by providing tooling, infrastructure, and operational assistance for training experiments.

operational

Job description

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: You want to build the cutting-edge systems that train AI models like Claude. You're excited to work at the frontier of machine learning, implementing and improving advanced techniques to create ever more capable, reliable and steerable AI. As an ML Systems Engineer on our Reinforcement Learning Engineering team, you'll be responsible for the critical algorithms and infrastructure that our researchers depend on to train models. Your work will directly enable breakthroughs in AI capabilities and safety. You'll focus obsessively on improving the performance, robustness, and usability of these systems so our research can progress as quickly as possible. You're energized by the challenge of supporting and empowering our research team in the mission to build beneficial AI systems. Our finetuning researchers train our production Claude models, and internal research models, using RLHF and other related methods. Your job will be to build, maintain, and improve the algorithms and systems that these researchers use to train models. You’ll be responsible for improving the speed, reliability, and ease-of-use of these systems. You may be a good fit if you: Have 4+ years of software engineering experience Like working on systems and tools that make other people more productive Are results-oriented, with a bias towards flexibility and impact Pick up slack, even if it goes outside your job description Enjoy pair programming (we love to pair!) Want to learn more about machine learning research Care about the societal impacts of your work Strong candidates may also have experience with: High performance, large scale distributed systems Large scale LLM training Python Implementing LLM finetuning algorithms, such as RLHF Representative projects: Profiling our reinforcement learning pipeline to find opportunities for improvement Building a system that regularly launches training jobs in a test environment so that we can quickly detect problems in the training pipeline Making changes to our finetuning systems so they work on new model architectures Building instrumentation to detect and eliminate Python GIL contention in our training code Diagnosing why training runs have started slowing down after some number of steps, and fixing it Implementing a stable, fast version of a new training algorithm proposed by a researcher Deadline to apply: None. Applications will be reviewed on a rolling basis. The annual compensation range for this role is listed below. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. Annual Salary: $500,000 — $850,000 USD Logistics Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience Required field of study: A field relevant to the
Source: Anthropic careers · scraped 2026-05-22
Apply at Anthropic