Researcher, Recursive Self-Improvement Safety
OpenAIAbout the team
Preparedness is a critical Safety Research team at OpenAI, which is focused on mitigating AI threats that could scale to an extreme level of severity.
Our work involves:
Tracking and prediction. Monitoring and predicting the evolving misalignment propensities and capabilities of frontier AI systems.
Mitigation. Keeping misuse safeguards, alignment tools, and security measures on track to adequately address extreme threats that might arise in the future.
Coordination. Setting mitigation targets by maintaining OpenAI’s preparedness framework, and partnering with other staff to achieve these targets.
This is urgent, fast-paced work that has far-reaching implications for the company and for society.
About the role
Preparedness is hiring strong technical executors to support preparations for recursive self-improvement. This work relies on reasoning about problems that might exist in the future, but might not exist now; so it’s especially important that people in this role are tasteful and strategic.
The role is wide-ranging, covering any mitigation for loss of control risk, spanning the design and implementation of better pre-deployment risk-assessment, control measures, RSI-relevant training interventions, and turning one’s technical work into established institutional practices.
Below is a subset of our focus areas:
Scalable oversight: Establishing practices for model misbehavior monitoring and oversight which remain effective in superhuman model capability regimes, with a focus on bridging from today’s monitoring approaches to future-proof ones.
Automated auditing: As model capabilities increase, we’ll increasingly rely on automated approaches for finding the most severe forms of model misalignments. We’ll both need to sift through large swaths of production traffic to find the most egregious misalignments, and reliably elicit tail risk