Backend Software Engineer (Evals) San Francisco at OpenAI — task breakdown

--- BEGIN UNTRUSTED EXTERNAL CONTENT (source: https://openai.com/careers/backend-software-engineer-(evals)-san-francisco/) --- Skip to main contentResearchProductsBusinessDevelopersCompanyFoundation(opens in a new window)Log inTry ChatGPT(opens in a new window)ResearchProductsBusinessDevelopersCompanyFoundation(opens in a new window)Backend Software Engineer (Evals) | OpenAICareersBackend Software Engineer (Evals) Support Automation - San Francisco and SeattleApply now(opens in a new window)About the TeamThe Support Automation team at OpenAI scales the organization by applying cutting-edge AI models to real-world challenges, automating and enhancing work across the organization. From customer operations to engineering, we develop an ecosystem of automation products that empower our colleagues and drive impact. We're passionate about crafting products that serve those around us, blending rapid prototyping with a focus on long-term quality and reliability. By creating reusable solutions, we create patterns that can be applied across diverse domains within OpenAI.TLDR: this team leverages OpenAI technology to improve OpenAI, and you’ll have the opportunity to leverage the full extent of our tech (both public and pre-released) to accomplish this mission.About the RoleWe’re looking for a Backend Software Engineer with experience working in ML/LLM-heavy domains to help to design and build an evals infrastructure that measures the quality of OpenAI’s support automation. This is a deeply technical and highly cross-functional role where you’ll build robust systems and backend services that serve as the foundation for how knowledge is created, accessed, and applied across OpenAI. The role will especially focus on working closely with Data Science and Research partners to design and build evals at scale.In this role, you will:Design eval pipelines that are reliable, reproducible, and extendableBuild the infrastructure for continuous eval monitoring frameworks (regression/drift monitoring, building robust golden datasets) along with feedback loops that ultimately strengthen support automationDesign, build, and maintain backend services and APIs to support intelligent automation and knowledge systemsIntegrate and structure data across internal platforms, transforming it into formats optimized for use by downstream systems and AI workflows.Collaborate closely with data, research, and engineering teams to integrate OpenAI models into high-leverage workflowsOwn the full development lifecycle of new backend systems and internal platform capabilitiesBuild with scale and maintainability in mind, while rapidly iterating on new ideasYou might be a great fit if you have:4+ years of backend engineering experience at product-driven companies (excluding internships)Proficiency in backend technologies. Our tech stack includes Python, FastAPI, and PostgresExperience designing and scaling distributed systems, APIs, or data processing pipelinesHave experience building AI agents or applications, including designing evals and improving performance through prompting or scaffoldingAre familiar with evaluation methods for LLMs and have worked with patterns like multi-agent workflows, tool use, or long context.Experience creating production evals and/or measuring performance of ML/LLM models at scaleA pragmatic mindset. You’re comfortable shipping iteratively while building toward a long-term visionAbout OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. We are an equal opportunity employer, and we do

Backend Software Engineer (Evals) San Francisco

Classified Tasks (16)

Augment (13)

Human-Only (3)

Job description