Enable job alerts via email!
Boost your interview chances
Create a job specific, tailored resume for higher success rate.
Join a pioneering team at the forefront of AI safety and governance, where your expertise will help shape the future of AI evaluations. This role offers a unique opportunity to engage in deep technical research, collaborating with top-tier researchers and engineers to enhance the rigor and reliability of AI system evaluations. You will contribute to developing innovative methods for measuring and predicting AI capabilities, ensuring that our insights are robust and impactful. With a strong emphasis on mentorship and professional growth, this position is ideal for those passionate about making a direct policy impact through rigorous scientific research.
AISI’s Science of Evaluations team will conduct applied and foundational research focused on two areas at the core of our mission: (i) measuring existing frontier AI system capabilities and (ii) predicting the capabilities of a system before running an evaluation.
Measurement of Capabilities: The goal is to develop and apply rigorous scientific techniques for the measurement of frontier AI system capabilities, so they are accurate, robust, and useful in decision making. This is a nascent area of research which supports one of AISI's core products: conducting tests of frontier AI systems and feeding back results, insights,and recommendations to model developers and policy makers.
The team will be an independent voice on the quality of our testing reports and the limitations of our evaluations. You will collaborate closely with researchers and engineers from the workstreams who develop and run our evaluations, getting into the details of their key strengths and weaknesses, proposing improvements, and developing techniques to get the most out of our results.
The key challenge is increasing the confidence in our claims about system capabilities, based on solid evidence and analysis. Directions we are exploring include:
Predictive Evaluations: The goal is to develop approaches to estimate the capabilities of frontier AI systems on tasks or benchmarks, before they are run. Ideally, we would be able to do this at some point early in the training process of a new model, using information about the architecture, dataset, or training compute. This research aims to provide us with advance warning of models reaching a particular level of capability, where additional safety mitigations may need to be put in place. This work is complementary to both safety cases—an AISI foundational research effort—and AISI’s general evaluations work.
This topic is currently an area of active research, and we believe it is poised to develop rapidly. We are particularly interested in developing predictive evaluations for complex, long-horizon agent tasks, since we believe this will be the most important type of evaluation as AI capabilities advance. You will help develop this field of research, both by direct technical work and via collaborations with external experts, partner organizations, and policy makers.
Across both focus areas, there will be significant scope to contribute to the overall vision and strategy of the science of evaluations team as an early hire. You’ll receive coaching from your manager and mentorship from the research directors at AISI (including Geoffrey Irving and Yarin Gal), and work closely with talented Policy / Strategy leads and Research Engineers and Research Scientists.
This role offers the opportunity to progress deep technical work at the frontier of AI safety and governance. Your work will include:
To set you up for success, we are looking for some of the following skills, experience and attitudes, but we are flexible in shaping the role to your background and expertise.
We are hiring individuals at all ranges of seniority and experience within the research unit, and this advert allows you to apply for any of the roles within this range. We will discuss and calibrate with you as part of the process. The full range of salaries available is as follows:
The Department for Science, Innovation and Technology offers a competitive mix of benefits including:
In accordance with the Civil Service Commission rules, the following list contains all selection criteria for the interview process.
The interview process may vary candidate to candidate, however, you should expect a typical process to include some technical proficiency tests, discussions with a cross-section of our team at AISI (including non-technical staff), conversations with your workstream lead. The process will culminate in a conversation with members of the senior team here at AISI.
Candidates should expect to go through some or all of the following stages once an application has been submitted:
We select based on skills and experience regarding the following areas: