Software Engineer, Build Systems / CI
OpenAIAbout the Role
The Engineering Acceleration team builds and operates the foundational systems that engineers use to build, test, and ship ChatGPT, the API, and OpenAI's infrastructure.
We are looking for an engineer to help evolve OpenAI's build and continuous integration systems for a fast-growing engineering organization. This role sits at the intersection of developer productivity, build systems, distributed infrastructure, and software quality. You will work on the systems that determine how quickly and confidently engineers can move: Bazel-based builds, Buildkite pipelines, test selection, remote caching and execution, CI observability, and tooling that helps engineers understand and fix failures quickly.
Our mission is to make OpenAI one of the most productive engineering organizations in the world while preserving a high bar for correctness, reliability, and safety. The best version of this work is invisible when it succeeds: builds are fast, tests are trusted, CI failures are understandable, and engineers can focus on shipping useful systems instead of fighting infrastructure.
In This Role, You Will
Own and evolve Bazel-based build and test workflows across a large, polyglot monorepo.
Design and maintain Starlark rules, macros, toolchains, and integrations that make builds reproducible, hermetic, and easy for product teams to adopt.
Improve CI performance and reliability across Buildkite pipelines, including queue time, build time, cache hit rates, test sharding, retry behavior, and flake isolation.
Build systems that reduce unnecessary CI work through affected-target detection, dependency graph analysis, test selection, caching, batching, and smarter scheduling.
Improve local development workflows so engineers can reproduce CI behavior, debug build failures, and iterate quickly without learning every detail of the build stack.
Operate and optimize build infrastructure across Docker/OCI images, Kubernetes-based runners, cloud resources, and remote cache/execution systems.
Instrument build and CI systems with metrics, logs, traces, dashboards, and analytics so we can measure speed, reliability, cost, and developer impact.
Partner directly with product, infrastructure, and research engineering teams to understand pain points, onboard projects, debug hard build issues, and remove systemic bottlenecks.
Use modern AI tools to rethink CI failure analysis