ML Research Engineer - Hardware Codesign
OpenAIAbout the Team
OpenAI’s Hardware organization develops AI-native silicon and system-level solutions for the unique demands of advanced AI workloads. Building on efforts like Jalapeño, the team is developing future generations of AI-native silicon and tightly integrated systems to power the next generation of frontier models. By co-designing chips, systems, tools, and methodologies, the team helps deliver faster, more efficient, and production-ready hardware for OpenAI’s supercomputing platform.
About the Role
We’re seeking a Research-Hardware Codesign Engineer to operate at the boundary between model research and silicon/system architecture. You’ll help shape the numerics, architecture, and technology bets of future OpenAI silicon in collaboration with both Research and Hardware.
Your work will include debugging gaps between rooflines and reality, writing quantization kernels, derisking numerics via model evals, quantifying system architecture tradeoffs, and implementing novel numeric RTL. This is a hands-on role for people who go looking for hard problems, get to ground truth, and drive it to production. Strong prioritization and clear, honest communication are essential.
Location: San Francisco, CA (Hybrid: 3 days/week onsite)
Relocation assistance available.
In this role you will:
Build on our roofline simulator to track evolving workloads, and deliver analyses that quantify the impact of system architecture decisions and support technology pathfinding.
Debug gaps between performance simulation and real measurements; clearly communicate root cause, bottlenecks, and invalid assumptions.
Write emulation kernels for low-precision numerics and lossy compression schemes, and get Research the information they need to trade efficiency with model quality.
Prototype numerics modules by pushing RTL through synthesis; hand off novel numerics cleanly, or occasionally own an RTL module end-to-end.
Proactively pull in new ML workloads, prototype them with rooflines and/or functional simulation, and drive initial evaluation of new opportunities or risks.
Understand the whole picture from ML science to hardware optimization, and slice this end-to-end objective into near-term deliverables.
Build ad-hoc collaborations across teams with very different goals and areas of expertise, and keep progress unblocked.
Communicate design tradeoffs clearly with explicit assumptions and confidence levels; produce a trail of evidence that e