Enable job alerts via email!

Site Reliability Engineering Manager

Rightmove

London

On-site

GBP 125,000 - 150,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a Site Reliability Engineering Manager to lead their website infrastructure and operations team. In this pivotal role, you will ensure the availability and performance of the Rightmove website, striving for world-class standards. Your responsibilities will include managing the migration of services to Google Cloud, developing cloud engineering skills within your teams, and optimizing service health through effective incident management. If you are an experienced manager with a passion for continuous improvement and operational excellence, this is an exciting opportunity to make a significant impact in a dynamic environment.

Qualifications

  • Experience managing website infrastructure and technical operations.
  • Deep understanding of DevOps and SRE principles.

Responsibilities

  • Manage and maintain datacentre and cloud infrastructure.
  • Lead migration of applications to Google Cloud.
  • Optimize service health and incident management processes.

Skills

Team Management
Operational Awareness
Technical Judgement
Incident Management
Continuous Improvement
Cloud Migration
DevOps Principles

Tools

Google Cloud Platform
Google Kubernetes Engine
Gitlab
Jira
Confluence
Slack
Teams
Elastic APM
Kibana
Eggplant Monitoring
Xymon

Job description

This job is brought to you by Jobs/Redefined, the UK's leading over-50s age inclusive jobs board.

Role: Site Reliability Engineering Manager (Website Infrastructure & Operations Manager)

Location: London / Hybrid - 2 days per week in the office

Reporting to: Head of Technology Operations

The Platform and Reliability Engineering Team are responsible for the technology platforms and services that underpin the Rightmove website, ensuring it is available, secure and performing to a world-class standard. We strive to deliver annual availability of at least 99.99% (less than 5 mins downtime a month).

The Site Reliability Engineering Manager's focus is to ensure their teams maintain our datacentre and cloud website infrastructure, safely migrate services to Google Cloud, and enable others to easily manage the reliability of production services across the Rightmove Website Estate.

A typical week as the Site Reliability Engineering Manager might involve:

  1. Ensuring the right people, process and tooling are in place to maintain a healthy, resilient, and secure datacentre and cloud website platform.
  2. Creating and managing technical plans for the migration of applications and infrastructure to Google Cloud.
  3. Developing cloud engineering and operations skills within your teams.
  4. Working through supplier due diligence process for support contract renewals to ensure key components are kept in support.
  5. Working with engineering managers, product owners, and engineers to optimise and improve service health.
  6. Identifying, planning and implementing improvements to the incident management process.
  7. Reducing handoffs or improving flow/lead times within development teams by providing operational/infrastructure support for the platform.

We're looking for someone who:

  1. Has previous experience managing engineers that are building and running website infrastructure and web services and previous experience running website technical operations.
  2. Is highly operationally aware, understanding what it takes to maintain a healthy website infrastructure and services.
  3. Is an experienced manager who understands how to get the best out of their people and teams.
  4. Has excellent judgement and can instill this in engineers, leading them to the best outcomes on technical decisions and architecture whilst enabling their development.
  5. Is happy to dive deep into technical discussions with their team and can surface risks and issues relating to projects.
  6. Is able to keep calm and work effectively in high pressure situations.
  7. Has experience migrating infrastructure and web services from datacentres to cloud.
  8. Has deep experience and understanding of DevOps and SRE principles and practices.
  9. Always pushes for continuous improvement and has strong attention to detail.

Relevant Technology we use:

  • F5, Juniper, Arbor
  • VMware, HP 3Par
  • Google Cloud Platform
  • Google Kubernetes Engine with Anthos Service Mesh
  • Confluent Cloud
  • Incident.io
  • Gitlab
  • Jira, Confluence, Slack, Teams
  • Elastic APM, Kibana
  • Eggplant Monitoring, Xymon
  • Java, Node, Python, Javascript, Go
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.