Anthropic· Technical Program Management · New York City, NY; San Francisco, CA | New York City, NY; Seattle, WA
Incident Response Manager - Product & Engineering
Classified Tasks (16)
Automate 0%Augment 56%Human-Only 44%
Augment (9)
AI assists, human decides
Build the incident response management function, establishing processes, tooling, and operational standards for handling incidents at scale
operational
Ensure accurate information flow and that no critical actions fall through the cracks during incidents
operational
Create and maintain incident response runbooks and playbooks
administrative
Own incident communications end-to-end, coordinating real-time internal updates and external communications including status pages, customer outreach, and stakeholder updates
communication
Participate in blameless incident reviews, provide operational context, and drive follow-through on critical remediations to prevent recurrence
analytical
Partner with engineering teams to develop and maintain incident response policies, procedures, and escalation frameworks that scale with organizational growth
operational
Collaborate with engineering, product, security, legal, go-to-market, and leadership teams to continuously improve incident detection, response, and post-incident learning
leadership
Establish and implement incident response tooling to support detection, coordination, and remediation
technical
Define and implement escalation frameworks for incidents across severity levels
operational
Human-Only (7)
Requires human judgment
Serve as an on-call incident commander, driving coordinated responses across technical and non-technical stakeholders during incidents of varying severity
leadership
Manage multiple active incidents simultaneously
operational
Engage appropriate stakeholders at the right time to bring order and direction to fast-moving, ambiguous incidents
leadership
Operate effectively during incidents even when runbooks or playbooks do not yet exist
leadership
Ensure incident communications reflect commitments to safety, transparency, and accuracy
communication
Coordinate across engineering, product, security, legal, go-to-market, and leadership to ensure incident responses are timely, clear, and accountable
operational
Act as the operational backbone for incident handling by coordinating roles, responsibilities, and information flow during incidents
operational
Job description
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the Role We are looking for an Incident Response Manager to serve as the operational backbone of how Anthropic handles incidents. When things go wrong, you are the person who makes sure the right people are in the room, the right information is flowing, and nothing falls through the cracks. The right person for this role brings structure and rigor to high-volume, high-stakes situations without waiting for a playbook to be handed to them. You will work across engineering, product, security, legal, go-to-market, and leadership to ensure Anthropic responds to incidents with speed, clarity, and accountability. This is not a role where you follow existing runbooks; it is a role where you write them, and where you operate effectively even when the runbook does not yet exist. Responsibilities Build the incident response management function, establishing the processes, tooling, and operational standards that define how we handle incidents at scale Serve as an on-call incident commander, driving coordinated response across technical and non-technical stakeholders during incidents of varying severity, including managing multiple active incidents simultaneously Engage the right people at the right time, with a strong sense of urgency, bringing order and direction to fast-moving, ambiguous situations Own incident communications end-to-end, from real-time internal coordination to external channels like status pages, direct customer outreach, and stakeholder updates, ensuring they reflect Anthropic's commitments to safety, transparency, and accuracy Participate in blameless incident reviews, contributing operational context and helping drive follow-through on critical remediations so the same class of incident does not recur Partner with engineering teams to develop and maintain incident response policies, procedures, and escalation frameworks that scale with Anthropic's growth Partner with engineering, product, security, legal, and go-to-market teams to continuously improve how the organization detects, responds to, and learns from incidents You May Be a Good Fit If You Have 5+ years of experience in incident management, with direct experience managing technical product or infrastructure incidents (not exclusively security or trust and safety) Have built or significantly shaped an incident response program, ideally at a high-growth startup or in an environment where you had to create structure rather than inherit it Demonstrate a strong sense of ownership and urgency, with the ability to operate independently and make sound decisions under pressure without waiting for direction Are comfortable working in unprecedented situations where processes are still being defined and guidance may be incomplete or conflicting, leaving things better than you found them Have a track record of effective cross-functional collaboration, particularly with engineering, security, legal, communications, go-to-market, and executive leadership Bring a blameless, learning-oriented mindset to incident reviews, focused on systemic improvement rather than individual fault Have experience with cloud infrastructure incidents and enough technical depth across the stack to engage meaningfully with engineering teams during response, including comfort navigating distributed systems, monitoring