Nuvepro - Task Intelligence for the Enterprise
Anthropic· Engineering & Design - Product· San Francisco, CA | New York City, NY

Prompt Engineer, Agent Prompts & Evals

Classified Tasks (26)

Automate 0%Augment 81%Human-Only 19%

Augment (21)

AI assists, human decides

Build AI-first products, features, and evaluations.

technical

Support product feature and model releases by integrating engineering expertise and assessing model quality.

operational

Analyze and document Claude's behavioral quirks and capabilities to inform product design across models and domains.

analytical

Design system prompts that shape Claude's behavior across consumer and API products.

technical

Test system prompts to validate behavior and performance across products.

technical

Optimize system prompts to improve model behavior and user outcomes.

technical

Design feature-specific prompts to guide model behavior for individual product features.

technical

Test feature-specific prompts for correctness and robustness.

technical

Optimize feature-specific prompts to enhance feature-specific model behavior.

technical

Design tool prompts to integrate external tools with Claude.

technical

Test tool prompts for correct tool invocation and behavior.

technical

Optimize tool prompts to ensure reliable tool usage by Claude.

technical

Develop and refine skills for Claude to support product features and workflows.

technical

Build comprehensive evaluation suites to measure model quality and consistency.

analytical

Maintain evaluation suites across product launches and model updates.

operational

Support model rollouts to ensure smooth deployments and minimize user impact.

operational

Detect and catch regressions in model behavior before they impact users.

analytical

Build frameworks and tools that enable teams to develop and test prompts and features with confidence.

technical

Improve frameworks and tools to increase prompt and feature development productivity and reliability.

technical

Help product teams build and integrate their initial evaluations into release processes.

operational

Iterate rapidly on prompts and features in response to evolving model capabilities.

operational

Human-Only (5)

Requires human judgment

Collaborate with product teams to align model capabilities with product experiences and ensure safe, consistent user experiences across product surfaces.

communication

Serve as the primary resource for product teams on Claude's AI infrastructure, including system prompts, tool prompts, skills, and evaluations.

communication

Manage multiple concurrent prompt engineering projects across product teams.

operational

Partner with research and safeguards teams to ensure new features meet quality and safety standards.

communication

Mentor product engineers on prompt engineering best practices.

leadership

Job description

About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role We’re looking for prompt and context engineers to join our product engineering team to help build AI-first products, features, and evaluations. Your mission will be to bridge the gap between model capabilities and real product experience, working with product teams to build consistent, safe, and beneficial user experiences across all product surfaces. You will be deeply involved in new product feature and model releases at Anthropic, combining engineering expertise with an understanding of frontier AI applications and model quality. You’ll become an expert on Claude’s behavioral quirks and capabilities and apply that knowledge to deliver the best possible user experience across models and domains. You’ll be the first resource for product teams working on Claude’s AI infrastructure: system prompts, tool prompts, skills, and evaluations. This role requires someone who can effectively balance caring deeply about making Claude the best it can be while also supporting a wide variety of concurrent projects and efforts across many product teams. Key Responsibilities Prompt Engineering Excellence: Design, test, and optimize system prompts and feature-specific prompts that shape Claude’s behavior across consumer and API products. Evaluation Development: Build and maintain comprehensive evaluation suites that ensure model quality and consistency across product launches and updates. Cross-functional Collaboration: Partner closely with product teams, research teams, and safeguards to ensure new features meet quality and safety standards. Model Launch Support: Play a critical role in model releases, ensuring smooth rollouts and catching regressions before they impact users. Infrastructure Contribution: Help build and improve the frameworks and tools that allow teams to develop and test prompts and features with confidence. Knowledge Transfer: Mentor product engineers on prompt engineering best practices and help teams build their first evaluations. Rapid Iteration: Work in a fast-paced environment where model capabilities advance daily, requiring quick adaptation and creative problem-solving. What We’re Looking For Required Qualifications 5+ years of software engineering experience with Python or similar languages. Demonstrated experience with LLMs and prompt engineering (through work, research, or significant personal projects). Strong understanding of evaluation methodologies and metrics for AI systems. Excellent written and verbal communication skills – you’ll need to explain complex model behaviors to diverse stakeholders. Ability to manage multiple concurrent projects and prioritize effectively. Experience with version control, CI/CD, and modern software development practices. Preferred Qualifications Experience with Claude or other frontier AI models in production settings. Background in machine learning, NLP, or related fields. Experience with A/B testing and experimentation frameworks (e.g., Statsig). Familiarity with AI safety and alignment considerations. Experience building tools and infrastructure for ML/AI workflows. Tr
Source: Anthropic careers · scraped 2026-05-22
Apply at Anthropic