Enable job alerts via email!

MLOps / Infrastructure Engineer

cfdx

London

On-site

GBP 50,000 - 90,000

Full time

2 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a talented individual to architect and scale foundational AI infrastructure for groundbreaking research in biology. This role involves developing robust ML training systems, optimizing resource utilization, and collaborating with a diverse team to drive impactful scientific discoveries. You'll have the opportunity to work with cutting-edge technology in a supportive environment that values curiosity and excellence. If you are passionate about making a difference in human health and thrive in complex technical challenges, this position offers a unique chance to contribute to meaningful advancements in the field.

Benefits

Growth Opportunities
Collaborative Environment
Continuous Learning
Innovative Projects

Qualifications

  • Experience in designing scalable ML infrastructure in cloud environments.
  • Strong programming skills in Python; knowledge of C++ is a plus.

Responsibilities

  • Architect and optimize ML training and inference infrastructure.
  • Collaborate with researchers to translate needs into engineering solutions.

Skills

Problem-solving
Communication
Collaboration
Technical Excellence
Ambition

Education

Bachelor's degree in Computer Science or related field
Master's degree in a relevant field

Tools

Kubernetes
Docker
Terraform
AWS
GCP
Azure
Kubeflow
Ray
Airflow
Metaflow

Job description

About Prima Mente

Prima Mente's goal is to deeply understand the brain, to protect the brain from neurological disease and enhance the brain in health. We do this by generating our own data, building brain foundation models, and translating discovery to real clinical and research impact.

Role focus - Foundation Models for Biology

Architect, build, and scale our foundational AI infrastructure. You'll ensure our ML models are developed and deployed on highly performant, scalable, and reliable systems. Your expertise will enable rapid experimentation and seamless deployment of large-scale multi-omic models, empowering researchers to advance groundbreaking scientific discoveries.

  • Architect, develop, and optimize robust ML training and inference infrastructure capable of supporting large-scale genomic foundation models.
  • Design and implement scalable and efficient distributed computing platforms leveraging cloud (AWS/GCP/Azure) and HPC clusters.
  • Develop highly automated, reproducible data pipelines and CI/CD workflows that accelerate model development, testing, and deployment.
  • Performance-tune infrastructure and models, optimizing resource utilization (GPU/TPU) and significantly improving computation efficiency.
  • Collaborate cross-functionally with ML researchers, bioinformaticians, and scientists to translate research needs into scalable engineering solutions.
  • Ensure system reliability, robustness, and high availability, proactively implementing comprehensive monitoring, logging, and alerting solutions.
  • Champion infrastructure-as-code (IaC) practices, promoting clarity, reproducibility, security, and auditability.

Expected Growth

  • In 1 month you will be responsible for running initial experiments with state-of-the-art machine learning models, reviewing and implementing cutting-edge research papers, and optimizing existing code for efficiency and accuracy.
  • In 3 months you'll directly own and have created a prototype model architecture, demonstrated significant algorithmic improvements, and contributed to scaling methods for large-scale data ingestion and training.
  • In 6 months you'll have developed a high-performance version of a foundation model, implemented key algorithmic optimizations that boost scalability and throughput, and published internal benchmarks demonstrating significant research impact.

Why Join Us:

  • Meaningful Impact: Contribute directly to research infrastructure that powers discoveries potentially impacting millions of lives.
  • Innovation & Autonomy: Work at the forefront of AI and multi-omics, with the freedom to propose and implement state-of-the-art infrastructure solutions.
  • Exceptional Team: Collaborate with talented colleagues from diverse backgrounds across ML, bioinformatics, and engineering.
  • Growth Opportunities: Continuous learning and growth opportunities in a rapidly advancing technical field.

Culture Insight

What we are doing is extremely hard. Prima Mente is for great people. We are team players who appreciate challenges, want to be hands-on, and thrive on curiosity by throwing away assumptions. We are focused on excellence at pace and huge personal growth. We are strong communicators who are highly disciplined and rigorous.

Prima Mente operates with a flat organizational structure. We gain and share knowledge by contributing to multiple opportunities. Leadership is given to those who show initiative and consistently deliver excellence.

We arrange our lives so we can work in person as much as possible.

Our Values

  • Exceptional performance at exceptional pace
    • The solutions we build demand uncompromising quality and rigour.
    • The problems we are solving are grave and present.
  • Inquisitive discovery
    • We embrace curiosity and creativity.
    • Every question is a path to a transformational breakthrough.
  • Radical candour
    • We practice unwavering honesty and transparency in all our challenges and interactions.
  • Purposeful individuality
    • Every individual in our team is celebrated for their identity, uniqueness, and experiences.
    • We are invested in each one's bespoke personal development.
    • Nurturing individuality will supercharge our collective purpose and spirit.
  • Patient impact at scale
    • We have a steadfast commitment to improve the health and well-being of patients globally.
    • Every experiment run, every dataset analysed, and every innovation developed, is a step towards achieving a scalable impact.

Who You Are

  • Ambitious and Impact-Driven: You're inspired by working at the forefront of AI and biology, motivated by challenges that can significantly advance human health.
  • Technical Excellence: You thrive in highly technical, complex environments and have a track record of turning cutting-edge research into robust production systems.
  • Collaborative & Communicative: You excel at collaborating across disciplines, clearly articulating complex ideas, and driving alignment among research and engineering teams.

Ideal experience

  • Demonstrated ability to solve complex problems independently, with exceptional troubleshooting and system debugging skills.
  • Excellent communication skills and experience collaborating within multidisciplinary teams.
  • Experience designing and deploying scalable, distributed ML infrastructure in cloud and/or hybrid HPC environments.
  • Proficiency in Kubernetes, Docker, Terraform (or equivalent infrastructure automation tools), and cloud services (AWS, GCP, Azure).
  • Deep experience with ML workflow orchestration tools (e.g., Kubeflow, Ray, Airflow, Metaflow).
  • Excellent programming skills in Python; experience with Bash, Go, or C++ is beneficial.
  • Strong understanding of ML frameworks (PyTorch, TensorFlow, JAX) and familiarity with distributed training methods, GPU acceleration, and optimization libraries (e.g., XLA, NCCL).
  • Excellent understanding of software development best practices, CI/CD, and automation.
  • Familiarity with GPU/TPU acceleration and performance optimization (XLA/NCCL).
  • Experience with bioinformatics or biological data handling.
  • Knowledge of data governance, compliance, and security standards relevant to healthcare or biotech.

Interview Process

Our intention is to run our interview process end to end within 2 weeks. You will interact with co-founders Ravi and Hannah, as well as every member of the technical team.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.