Skip to main content

Senior Software Engineer: Agentic Evaluation

**Weekly Hours:** 40

**Role Number:** 200666976-0836

**Summary**

Join the team redefining what a deeply personal and integrated assistant can be.

As part of the Siri organization, you will help shape one of the world's most widely used AI assistants, powered by our next-generation of Apple Intelligence, with capabilities like personal context understanding and on-screen awareness, built with privacy from the ground up. Your work will have direct, meaningful impact for users across iOS, iPadOS, macOS, watchOS, and visionOS.

This is a rare opportunity to build at the intersection of cutting-edge AI and human-centered design, shipping technology that is centered around users and their needs.

**Description**

In this role you'll contribute to the infrastructure, tooling, and pipelines that let us evaluate Siri reliably and at scale. You'll have meaningful autonomy in how you get there, and the work will move across several areas of expansion as priorities evolve. The specific platforms, frameworks, and components will change over time, so we're looking for someone who can transition smoothly across them and bring strong evaluation and systems engineering fundamentals to whatever the team needs next.

**Minimum Qualifications**

+ Strong programming skills in one or more compiled languages (Swift, C++ or Objective-C).

+ Python scripting skills for tooling and automation

+ Solid understanding of computer science fundamentals

+ Ability to quickly learn new technologies and adapt to evolving requirements

+ Excellent communication skills and ability to collaborate across teams

+ M.S. or B.S. in Computer Science, Machine Learning, or related field (or equivalent experience)

**Preferred Qualifications**

+ Experience staging, provisioning, or controlling test or evaluation environments to produce repeatable, deterministic conditions

+ Experience evaluating ML, LLM or agent-based systems, including familiarity with metrics, scoring methodology, or trajectory and outcome analysis

+ Experience designing or operating test infrastructure at scale, such as device provisioning, environment restore, warm pools, or continuous integration systems

+ Proficiency with Python and Swift in a production setting

+ A track record of approaching problems flexibly and cutting through ambiguity, adapting your approach to reach the right outcome and setting a clear path when requirements are not yet defined

+ A talent for focusing and simplifying, stripping away what is not essential and distilling complex decisions down to the factors that matter

+ A history of collaborating across teams and communicating effectively with both technical and program audiences


Similar jobs