**Weekly Hours:** 40
**Role Number:** 200668263-3543
**Summary**
Are you a senior engineer who can keep large, AI-augmented systems running
reliably at Apple scale? Apple's Stability Engineering team is looking for a
seasoned engineer to join our Core team in San Diego. We build and operate the
platforms, services, and infrastructure that turn crash reports from Apple
devices into actionable engineering insights. You'll work on systems where
LLMs and agents are already part of the production fabric — evolving them,
hardening them, and using AI tools to extend what a small team can deliver.
**Description**
Our team owns the end-to-end platform behind stability analysis at Apple:
symbolication of crash logs across the company's hardware portfolio, the data
pipelines that aggregate and cluster crash logs, and the applications and
services that engineers across Apple use every day to drive operating-system
quality. This role is about keeping that platform healthy, extending it
deliberately, and making the engineering team itself more effective by using
AI tools well.
Day to day, you'll spend most of your time on the engineering work of running
real systems: tuning evaluation infrastructure, tightening operational
controls, improving auditability and debug trails, and scaling the workflows
our analysts rely on. When new capabilities are needed, you'll prototype and
integrate them into the platform. You'll partner closely with stability
analysts who are domain experts in OS reliability, and with the broader team
responsible for symbolication, ETL, and service infrastructure. You'll also
be expected to use AI-assisted development tools fluently to investigate
issues, refactor at scale, and ship more with a small team.
We're looking for someone with the rigor of a seasoned production engineer
who is also comfortable operating systems that include LLMs and agents as
first-class components. If you enjoy taking responsibility for a complex,
already-running platform and making it steadily better, we want to talk.
**Minimum Qualifications**
+ 5+ years of professional software engineering experience building and operating production systems
+ BS in Computer Science or a related field, or equivalent practical experience
+ Fluent use of AI-assisted development tools (coding agents, code review assistants, etc.) to work effectively at scale
+ Demonstrated experience designing and scaling distributed systems (load balancing, active-active topologies, capacity planning, throughput-bound services)
+ Track record of maintaining and evolving production services — observability, operational controls, incident response, and steady iteration on existing systems
+ Strong full-stack instincts; comfortable spanning data infrastructure, backend services, and the user-facing surfaces that consume them
+ Proven ability to operate independently on ambiguous, open-ended problems where the right answer is not obvious
**Preferred Qualifications**
+ Experience operating LLM- or agent-based features in production environments over time
+ Experience building or maintaining evaluation harnesses, audit trails, or
+ replay infrastructure for AI systems
+ Background in developer tools, observability, crash/stability analysis, or other operating-system-quality domains
+ Familiarity with one or more of: Ruby on Rails, Node.js/TypeScript, Python for production services
+ Experience working in environments with significant deferred scalability work (capacity-constrained, long-lead-time infrastructure)