Abuse Investigator (Ai Self Improvement Risk) San Francisco
Classified Tasks (13)
Augment (13)
AI assists, human decides
Identify cases where models exhibit autonomous or agentic behavior (e.g., chaining capabilities, increasing independence, capability expansion) that introduce safety risks
analytical
Investigate model behavior to determine whether outputs demonstrate agentic or autonomous patterns
analytical
Review investigative leads related to potential agentic or autonomous model behaviors
analytical
Detect behaviors not explicitly intended, understood, or covered by existing safeguards
analytical
Detect and analyze multi-step behaviors such as multi-step planning, capability chaining, tool use, persistence, and workaround behavior
analytical
Review complex or sensitive model behaviors and edge-case outputs
analytical
Conduct investigations of complex systems where behavior emerges across multiple steps, tools, or interactions
analytical
Distinguish between normal task execution and concerning patterns such as persistence, workaround behavior, or capability expansion
analytical
Develop signals and tracking strategies to proactively identify emerging agentic risk patterns across the platform
technical
Identify gaps in existing safeguards, evaluations, or monitoring systems and propose improvements
analytical
Communicate investigation findings to technical, policy, and leadership stakeholders
communication
Provide analysis and investigation outputs to enable partner teams to develop data-backed policies and safety mitigations
communication
Document investigation procedures, findings, and recommended mitigations
administrative