Enable job alerts via email!

Interpretability Researcher

AI Safety Institute

London

On-site

GBP 80,000 - 100,000

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a pioneering team at the forefront of AI safety research, focusing on mechanistic interpretability. This role offers the chance to work with exceptional researchers and contribute to groundbreaking advancements in ensuring AI models are safe and reliable. You will have the autonomy to explore ambitious research questions, supported by unparalleled resources and a strong culture of learning and development. If you are passionate about tackling the challenges of AI safety and want to make a significant impact, this opportunity is perfect for you.

Benefits

Pension options

Coaching and mentorship

Access to high-performance computing resources

Flexible working hours

Qualifications

  • Experience in mechanistic interpretability and deep learning research.
  • Strong academic background with evidence of significant contributions.

Responsibilities

  • Lead research on mechanistic interpretability and automated scheming detection.
  • Collaborate with world-class researchers and enhance scientific standards.

Skills

Mechanistic interpretability research

Deep learning breakthroughs

Large language models (e.g., GPT-4)

Academic excellence

Mentorship and feedback

Communication skills

Multi-disciplinary teamwork

Python programming

Education

PhD in a relevant field

Master's degree in AI or related field

Tools

Nvidia Grace-Hopper GPUs

Job description

Mechanistic Interpretability

AISI is launching a brand-new Mechanistic Interpretability team to research the fundamental question of how can we tell if a model is scheming? This is an ambitious bet to bring interpretability as a field into prime time. We believe that this is a vital challenge that mechanistic interpretability can help solve, ensuring that dangerous capability evaluations can be reliably determine if models are safe to release even when the models themselves are capable of gaming the evals. We also think it can lead to an entirely new field of alignment evaluations and make substantial contributions to the problem of technical AI safety.

To launch this project we're looking for a team lead, research scientists and research engineers. Apply now to join the largest technical AI safety lab on the planet - help us make this happen!

Role Summary

This team will have a large amount of scientific autonomy, with the ability to chase ambitious research bets. Your responsibilities may involve any of the following:

  • Supervised fine tuning (SFT) of large models for scheming.
  • Training sparse auto encoders (or fine-tuning open source SAEs).
  • Circuit discovery/analysis.
  • Automated scheming detection.

You’ll receive coaching from your manager and mentorship from the research directors at AISI (including Geoffrey Irving and Yarin Gal). You will also regularly interact with world-famous researchers and other incredible staff (including alumni from Anthropic, DeepMind, OpenAI and ML professors from Oxford and Cambridge). We have a very strong learning & development culture to support this, including Friday afternoons devoted to deep reading and multiple weekly paper reading groups. From a compute perspective, you'll have unparalleled access to resources including 5,448 Nvidia Grace-Hopper GPUs (e.g., H100s).

Person Specification

You may be a good fit if you have some of the following skills, experience and attitudes:

  • Hands-on mechanistic interpretability research experience.
  • Experience working within a research team that has delivered multiple exceptional scientific breakthroughs in deep learning (or a related field). We’re looking for evidence of an exceptional ability to drive progress.
  • Comprehensive understanding of large language models (e.g. GPT-4), including both a broad understanding of the literature and hands-on experience with pre-training or fine tuning LLMs.
  • Strong track-record of academic excellence (e.g. multiple spotlight papers at top-tier conferences).
  • Improving scientific standards and rigour through mentorship & feedback.
  • Strong written and verbal communication skills.
  • Experience working with world-class multi-disciplinary teams, including both scientists and engineers (e.g. in a top-3 lab).
  • Acting as a bar raiser for interviews.
Salary & Benefits

We are hiring individuals at all ranges of seniority and experience within the research unit, and this advert allows you to apply for any of the roles within this range. We will discuss and calibrate with you as part of the process. The full range of salaries available is as follows:

  • L3: £65,000 - £75,000
  • L4: £85,000 - £95,000
  • L5: £105,000 - £115,000
  • L6: £125,000 - £135,000
  • L7: £145,000

There are a range of pension options available which can be found through the Civil Service website.

Selection Process

In accordance with the Civil Service Commission rules, the following list contains all selection criteria for the interview process.

Required Experience

We select based on skills and experience regarding the following areas:

  • Mechanistic interpretability experience
  • Research problem selection
  • Research science
  • Writing code efficiently
  • Python
  • Frontier model architecture knowledge
  • Frontier model training knowledge
  • Model evaluations knowledge
  • AI safety research knowledge
  • Written communication
  • Verbal communication
  • Teamwork
  • Interpersonal skills
  • Tackle challenging problems
  • Learn through coaching
Desired Experience

We additionally may factor in experience with any of the areas that our work-streams specialise in:

  • Autonomous systems
  • Cyber security
  • Chemistry or Biology
  • Safeguards
  • Safety Cases
  • Societal Impacts
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Research Engineer / Scientist, Alignment Science, London

Only for registered members

London

Hybrid

GBP 50,000 - 90,000

2 days ago
Be an early applicant

Research Engineer / Scientist, Alignment Science, London

Only for registered members

London

Hybrid

GBP 50,000 - 90,000

2 days ago
Be an early applicant

Research Engineer / Scientist, Alignment Science (London)

Only for registered members

London

On-site

GBP 50,000 - 90,000

11 days ago

AI Research Scientist

Only for registered members

London

Hybrid

GBP 60,000 - 100,000

30+ days ago