Nuvepro - Task Intelligence for the Enterprise
xAI· Engineering· Palo Alto, CA

Data Engineer

Comp$240,000 – $280,000

Classified Tasks (15)

Automate 0%Augment 87%Human-Only 13%

Augment (13)

AI assists, human decides

Develop systems, processes, and production code to power data acquisition, preparation, quality evaluation, and delivery for model training

technical

Collaborate with acquisition teams, ML engineers, and software engineers to identify data needs, build scalable data pipelines, and continuously improve data quality

operational

Determine what data is needed to improve model performance

analytical

Build production pipelines and systems that transform raw inputs into high-quality training data at scale

technical

Analyze the performance and impact of data used throughout the model training lifecycle

analytical

Investigate anomalous model behavior and identify the data issues that drive poor downstream performance

analytical

Design, build, and improve data cleaning, transformation, and quality-control steps to produce high-quality training data

technical

Research, evaluate, and develop frontier methods for improving data quality and effectiveness in AI model development

creative

Apply statistical techniques and empirical analysis to make data-driven decisions about dataset quality and model outcomes

analytical

Build and maintain production-grade data pipelines, tooling, and software systems that ingest, process, validate, and deliver training data

technical

Develop metrics, evaluation frameworks, and monitoring systems to assess how data quality influences model behavior at scale

analytical

Fuse data from multiple sources into reliable, usable datasets for research and production model training

technical

Create shared datasets, tooling, and intern

technical

Human-Only (2)

Requires human judgment

Partner with acquisition teams to identify where valuable data can be sourced

operational

Partner across teams to identify data needs and define the highest-impact opportunities for new data acquisition and improvement

operational

Job description

ABOUT xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: At xAI, we are building AI systems that push the frontier of human knowledge and scientific discovery. High-quality data is fundamental to every stage of that mission. Our Data team is responsible for ensuring that the models are trained on the right data, in the right form, at the right quality, across every phase of the training lifecycle. This includes partnering closely with acquisition teams to identify where valuable data can be sourced, determining what data is needed to improve model performance, and building the production pipelines and systems that transform raw inputs into high-quality training data at scale. We work at the intersection of data, infrastructure, and machine learning to ensure our models train effectively and reliably. As a Data Engineer / AI Engineer on xAI’s Data team, you will be responsible for developing the systems, processes, and production code that power data acquisition, preparation, quality evaluation, and delivery for model training. You will work closely with acquisition teams, ML engineers, and software engineers to identify data needs, build scalable data pipelines, and continuously improve the quality of the data that shapes model behavior. The ideal candidate combines strong software engineering fundamentals and excellent coding practices with deep intuition for statistics, neural networks, and how data quality influences training outcomes. RESPONSIBILITIES: Analyze the performance and impact of data used throughout the model training lifecycle Investigate anomalous model behavior and rigorously identify the data issues that drive poor downstream performance Design, build, and improve the data cleaning, transformation, and quality-control steps required to produce high-quality training data Research, evaluate, and develop frontier methods for improving data quality and effectiveness in AI model development Apply statistical techniques and empirical analysis to make informed, data-driven decisions about dataset quality and model outcomes Partner across teams to identify where data needs exist and define the highest-impact opportunities for new data acquisition and improvement Build and maintain production-grade data pipelines, tooling, and software systems that ingest, process, validate, and deliver data for training Develop metrics, evaluation frameworks, and monitoring systems to assess how data quality influences model behavior at scale Fuse data from multiple sources into reliable, usable datasets for research and production model training Create shared datasets, tooling, and intern
Source: xAI careers · scraped 2026-05-22
Apply at xAI