Enable job alerts via email!

Site Reliability Engineer, Europe

Kosli

United Kingdom

Remote

GBP 50,000 - 90,000

21 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Join a forward-thinking company as a Site Reliability Engineer, where you will play a crucial role in enhancing the reliability of software delivery in the financial services sector. This dynamic position involves managing large-scale cloud infrastructure and collaborating with development teams to ensure robust, secure, and scalable applications. You will leverage your expertise in infrastructure management, observability, and CI/CD pipelines to make a real impact on how major financial institutions deliver software. If you're passionate about quality infrastructure and thrive in a fast-paced startup environment, this is the perfect opportunity for you.

Competitive salary and generous equity

Remote-first environment

Regular team meet-ups across Europe

Budget for learning and development

A voice in shaping our product and company

Real impact on software delivery

Experience with large-scale systems and cloud infrastructure management.
Proficiency in CI/CD pipelines and monitoring/observability stacks.

Design and implement reliability into applications and infrastructure.
Manage cloud infrastructure using Terraform and AWS.
Lead security implementation and compliance checks.

Infrastructure Management

Data Analysis

Observability

Platform Development

Problem-Solving

Collaboration

SRE Practices

Monitoring Skills

Bachelor's Degree in Computer Science or related field

Terraform

AWS

GitHub Actions

GitLab

Prometheus

Grafana

RollBar

Get AI-powered advice on this job and more exclusive features.

This range is provided by Kosli. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

Base pay range

Do you want to shape the future of software delivery in the financial services industry?

Kosli is looking for a Site Reliability Engineer to join our growing team. As part of a fast-paced startup, this role is about building and maintaining a large scale data and compute cloud infrastructure that powers our SaaS platform.

About Kosli

Kosli’s mission is to change the way we deliver software in regulated industries. The world is built on mission critical software. It calculates our bank balance. It drives our cars. It diagnoses our illnesses.

We want to empower the engineers who make this software.

When you’re regulated, every change needs to be controlled. This typically means manual paperwork, meetings, delays, and more risk.

We believe this should be automated, and we’re building technology to make this happen.

We are an ambitious group of people and we want you to join us.

We are funded by leading VC investors such as Heavybit (investors in Snyk, LaunchDarkly, CircleCI, Netlify, Tailscale).

About the Role

As a Site Reliability Engineer at Kosli, you will play a pivotal role in embedding reliability into the core of our applications and infrastructure. This position combines expertise in infrastructure management, data analysis, observability, and platform development to ensure our services are robust, secure, and scalable. You will work closely with development teams to integrate reliability into the application lifecycle, leveraging your skills in Python and other languages to build resilient systems.

Key Responsibilities

Design and Implement Reliability: Collaborate with development teams to integrate reliability into application design and development, focusing on building fault-tolerant systems.
Cloud Infrastructure Management: Manage and evolve Kosli’s cloud infrastructure using Terraform and AWS, ensuring it supports scalable and reliable application deployments.
Security and Compliance: Lead security implementation and compliance checks across our infrastructure, ensuring alignment with industry standards.
Observability and Monitoring: Own and improve our monitoring and observability stack (Prometheus, Grafana & RollBar to name a few) to provide actionable insights that inform reliability improvements.
CI/CD Pipelines: Take ownership of build and deployment pipelines using GitHub Actions, ensuring smooth and reliable software delivery.
On-Premise Solutions: Lead the development of our on-premise solution for customers, focusing on reliability and scalability.
Shared Infrastructure Components: Develop and maintain shared infrastructure components for customers, including Terraform modules and shared GitHub Actions and GitLab Pipelines.
Service Level Management: Use your experience to assist in implementing and driving adoption of Service Level Agreements (SLAs), Service Level Objectives (SLOs), and error budgets to ensure alignment with business objectives and customer expectations.

You Might Be a Great Fit If You Have

Experience with Large-Scale Systems: A background in operating large-scale data platforms or applications with a focus on reliability.
Infrastructure-as-Code Expertise: Deep expertise with Terraform and AWS cloud platforms.
Monitoring and Observability Skills: Experience building and maintaining monitoring/observability stacks to drive reliability improvements.
CI/CD Proficiency: Proficiency with CI/CD pipelines, especially GitHub Actions and GitLab.
Reliability Track Record: A track record of improving system reliability and deployment processes.
Programming Skills: Familiarity with Python, Go, and shell scripting, with a focus on using these skills to enhance application reliability.
SRE Practices and Service Levels: Knowledge of modern SRE practices, including the implementation of SLAs, SLOs, and error budgets to manage service reliability and availability.
Passion for Quality Infrastructure: Passion for quality infrastructure code and modern SRE practices that prioritise reliability and scalability.
Problem-Solving and Collaboration: Strong problem-solving abilities, attention to detail, and clear communication skills to collaborate effectively in a distributed team.
Curiosity and Enthusiasm: Enthusiasm for being an early user of tools you help build and curiosity about regulated industries and compliance requirements.

What We Offer

Competitive salary and generous equity - we want you to own part of what you’re building
Remote-first environment with a focus on flexibility
Regular team meet-ups across Europe
Budget for learning and development
A voice in shaping both our product and our company
Real impact on how some of the world’s largest financial institutions deliver software

If you are excited by the idea of transforming software delivery in financial services and thrive in a fast-paced startup environment, we would love to hear from you!

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Information Technology and Engineering
Industries: Software Development, IT System Operations and Maintenance, and IT System Custom Software Development

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Similar jobs

Senior Site Reliability Engineer

Only for registered members

Remote

GBP 50,000 - 80,000

Today

Be an early applicant

Site Reliability Engineer, APAC

Only for registered members

London

Remote

GBP 50,000 - 90,000

19 days ago

Senior Site Reliability Engineer

Only for registered members

London

Hybrid

GBP 60,000 - 100,000

Today

Be an early applicant

Senior Site Reliability Engineer | London, UK

Only for registered members

London

Remote

GBP 50,000 - 90,000

10 days ago

Site Reliability Engineer - Azure - 45,000 - 55,000

Only for registered members

Greater London

Remote

GBP 45,000 - 55,000

15 days ago