Nuvepro - Task Intelligence for the Enterprise
xAI· Model· Palo Alto, CA

Member of Technical Staff - RL Infrastructure

Comp$180,000 – $440,000

Classified Tasks (19)

Automate 5%Augment 89%Human-Only 5%

Automate (1)

Fully handled by AI agents

Track and report model performance metrics on newly onboarded evaluation datasets

analytical

Augment (17)

AI assists, human decides

Create and maintain robust data pipelines for large-scale training and evaluation

technical

Design and implement comprehensive evaluation suites to benchmark large language models

technical

Build automation frameworks to increase researcher and engineer productivity

technical

Design and implement efficient, robust environments for agentic models to perform actions

technical

Add features to the evaluation framework to streamline researcher workflows and increase observability

technical

Onboard open-source evaluation datasets into the internal evaluation framework, including ingestion and validation

technical

Standardize preprocessing pipelines to prepare datasets for large-scale reinforcement learning training

technical

Create data augmentation pipelines to generate additional training data and integrate them into training workflows

technical

Build high-performance sandboxes, virtual machines, and simulations for agent testing

technical

Develop full-stack applications for automating workflows and visualizing data and metrics

technical

Improve alerts, metrics, and error handling for large-scale reinforcement learning jobs

operational

Refactor agent, data, evaluation, and training frameworks to improve modularity and maintainability

technical

Write unit tests to validate code correctness and support rapid development cycles

technical

Develop and maintain CI/CD pipelines to support rapid iteration from research to production

technical

Instrument observability and monitoring systems to track model performance and evaluation results

operational

Automate common workflows to reduce manual intervention and accelerate experimentation

operational

Prepare and validate datasets requiring complex preprocessing for large-scale RL training

technical

Human-Only (1)

Requires human judgment

Design operational procedures and coding standards to streamline transition from small-scale experiments to large-scale RL training

operational

Job description

ABOUT xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: xAI is seeking experienced software engineers to create robust data pipelines, comprehensive evaluations for benchmarking LLMs, and automation frameworks to increase the productivity of researchers and engineers. Typical problems you will deal with include the following: We have a new agentic model capability that we’d like to improve. How do we design an efficient and robust environment for the agent to perform actions in? Evaluations and observability are a core part of knowing what we need to improve in our models. What new features can we add into our evaluation framework to ease the workflow of researchers & engineers and increase observability? A new open-source evaluation dataset has been released and researchers would like to track our models performance on it. How should we onboard it into our internal evaluation framework? Datasets have been collected that require complex pre-processing to prepare it for large-scale RL training. How do we standardize our preprocessing pipelines to minimize dataset onboarding time? A researcher on the team has an idea for how to augment a dataset to produce additional training data. How should we go about creating the data augmentation pipeline? RESPONSIBILITIES: Creating and maintaining frameworks for agent, data, and model evaluation tasks. Building environments for AI agents. Tools for automating common workflows. Improving alerts, metrics and error handling on large scale RL jobs. Refactoring existing agent, data, eval, training frameworks for better modularity. Designing operation procedures and coding standards to streamline the transition from small scale experimentation to large scale RL training. Writing unit tests, CI/CD frameworks to support rapid development cycles. BASIC QUALIFICATIONS: Experience building and maintaining frameworks that are used by many engineers. Experience in building high-performance sandboxes, virtual machines, and simulations. Experience building full-stack apps for automating workflows and data visualization. Experience in rapid iteration of research to production cycles. Experience in test automation, CI/CD. COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. xAI is an equal opportunity employer. For details on data processing, view
Source: xAI careers · scraped 2026-05-22
Apply at xAI