Enable job alerts via email!

Interpretability Researcher

AI Safety Institute

London

On-site

GBP 80,000 - 100,000

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Join a pioneering team at the forefront of AI safety research, focusing on mechanistic interpretability. This role offers the chance to work with exceptional researchers and contribute to groundbreaking advancements in ensuring AI models are safe and reliable. You will have the autonomy to explore ambitious research questions, supported by unparalleled resources and a strong culture of learning and development. If you are passionate about tackling the challenges of AI safety and want to make a significant impact, this opportunity is perfect for you.

Pension options

Coaching and mentorship

Access to high-performance computing resources

Flexible working hours

Experience in mechanistic interpretability and deep learning research.
Strong academic background with evidence of significant contributions.

Lead research on mechanistic interpretability and automated scheming detection.
Collaborate with world-class researchers and enhance scientific standards.

Mechanistic interpretability research

Deep learning breakthroughs

Large language models (e.g., GPT-4)

Academic excellence

Mentorship and feedback

Communication skills

Multi-disciplinary teamwork

Python programming

PhD in a relevant field

Master's degree in AI or related field

Nvidia Grace-Hopper GPUs

Mechanistic Interpretability

AISI is launching a brand-new Mechanistic Interpretability team to research the fundamental question of how can we tell if a model is scheming? This is an ambitious bet to bring interpretability as a field into prime time. We believe that this is a vital challenge that mechanistic interpretability can help solve, ensuring that dangerous capability evaluations can be reliably determine if models are safe to release even when the models themselves are capable of gaming the evals. We also think it can lead to an entirely new field of alignment evaluations and make substantial contributions to the problem of technical AI safety.

To launch this project we're looking for a team lead, research scientists and research engineers. Apply now to join the largest technical AI safety lab on the planet - help us make this happen!

Role Summary

This team will have a large amount of scientific autonomy, with the ability to chase ambitious research bets. Your responsibilities may involve any of the following:

Supervised fine tuning (SFT) of large models for scheming.
Training sparse auto encoders (or fine-tuning open source SAEs).
Circuit discovery/analysis.
Automated scheming detection.

You’ll receive coaching from your manager and mentorship from the research directors at AISI (including Geoffrey Irving and Yarin Gal). You will also regularly interact with world-famous researchers and other incredible staff (including alumni from Anthropic, DeepMind, OpenAI and ML professors from Oxford and Cambridge). We have a very strong learning & development culture to support this, including Friday afternoons devoted to deep reading and multiple weekly paper reading groups. From a compute perspective, you'll have unparalleled access to resources including 5,448 Nvidia Grace-Hopper GPUs (e.g., H100s).

Person Specification

You may be a good fit if you have some of the following skills, experience and attitudes:

Hands-on mechanistic interpretability research experience.
Experience working within a research team that has delivered multiple exceptional scientific breakthroughs in deep learning (or a related field). We’re looking for evidence of an exceptional ability to drive progress.
Comprehensive understanding of large language models (e.g. GPT-4), including both a broad understanding of the literature and hands-on experience with pre-training or fine tuning LLMs.
Strong track-record of academic excellence (e.g. multiple spotlight papers at top-tier conferences).
Improving scientific standards and rigour through mentorship & feedback.
Strong written and verbal communication skills.
Experience working with world-class multi-disciplinary teams, including both scientists and engineers (e.g. in a top-3 lab).
Acting as a bar raiser for interviews.

Salary & Benefits

We are hiring individuals at all ranges of seniority and experience within the research unit, and this advert allows you to apply for any of the roles within this range. We will discuss and calibrate with you as part of the process. The full range of salaries available is as follows:

L3: £65,000 - £75,000
L4: £85,000 - £95,000
L5: £105,000 - £115,000
L6: £125,000 - £135,000
L7: £145,000

There are a range of pension options available which can be found through the Civil Service website.

Selection Process

In accordance with the Civil Service Commission rules, the following list contains all selection criteria for the interview process.

Required Experience

We select based on skills and experience regarding the following areas:

Mechanistic interpretability experience
Research problem selection
Research science
Writing code efficiently
Python
Frontier model architecture knowledge
Frontier model training knowledge
Model evaluations knowledge
AI safety research knowledge
Written communication
Verbal communication
Teamwork
Interpersonal skills
Tackle challenging problems
Learn through coaching

Desired Experience

We additionally may factor in experience with any of the areas that our work-streams specialise in:

Autonomous systems
Cyber security
Chemistry or Biology
Safeguards
Safety Cases
Societal Impacts

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Research Engineer / Scientist, Alignment Science, London

Only for registered members

London

Hybrid

GBP 50,000 - 90,000

2 days ago

Be an early applicant

Research Engineer / Scientist, Alignment Science, London

Only for registered members

London

Hybrid

GBP 50,000 - 90,000

2 days ago

Be an early applicant

Research Engineer / Scientist, Alignment Science (London)

Only for registered members

London

On-site

GBP 50,000 - 90,000

11 days ago

AI Research Scientist

Only for registered members

London

Hybrid

GBP 60,000 - 100,000

30+ days ago

Interpretability Researcher

AI Safety Institute

London

On-site

GBP 80,000 - 100,000

Job summary

Benefits

Pension options

Coaching and mentorship

Access to high-performance computing resources

Flexible working hours

Qualifications

Responsibilities

Skills

Mechanistic interpretability research

Deep learning breakthroughs

Large language models (e.g., GPT-4)

Academic excellence

Mentorship and feedback

Communication skills

Multi-disciplinary teamwork

Python programming

Education

PhD in a relevant field

Master's degree in AI or related field

Tools

Nvidia Grace-Hopper GPUs

Job description

Similar jobs

Research Engineer / Scientist, Alignment Science, London

London

Hybrid

GBP 50,000 - 90,000

Research Engineer / Scientist, Alignment Science, London

London

Hybrid

GBP 50,000 - 90,000

Research Engineer / Scientist, Alignment Science (London)

London

On-site

GBP 50,000 - 90,000

AI Research Scientist

London

Hybrid

GBP 60,000 - 100,000