Enable job alerts via email!

Site Reliability Engineer (SRE) (Remote)

Remotestar

Cambourne

Hybrid

GBP 60,000 - 80,000

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Join a leading multinational IT services company as a Site Reliability Engineer, where you'll play a crucial role in maintaining the performance and reliability of applications. This exciting position offers the chance to work with cutting-edge technology and contribute to digital transformation projects. You'll collaborate closely with developers, tackle operational challenges, and implement monitoring solutions to ensure seamless user experiences. The role begins remotely, transitioning to a hybrid model, allowing for flexibility while being part of a forward-thinking team. If you're passionate about technology and thrive in dynamic environments, this opportunity is perfect for you.

5-9 years of experience in Site Reliability Engineering.
Experience with cloud-native applications and monitoring tools.

Handle operational issues like production failures and infrastructure problems.
Ensure the availability and performance of applications.

Site Reliability Engineering

Problem Solving

Communication Skills

Capacity Planning

Performance Tuning

Distributed Systems

Incident Management

AppDynamics

Splunk

GCP Operations Suite

Azure

GCP

About the Company:

At RemoteStar, we’re hiring for one of our clients, a leading multinational IT services and consulting company specializing in digital transformation, cloud solutions, and AI-driven innovation. With a strong global presence, the company partners with enterprises across various industries to deliver cutting-edge technology solutions.

Job Title: Site Reliability Engineer (SRE)

Experience: 5 to 9 years

Location: Pan India (Remote)

Work Mode: Initially remote for this project. Later, the client will transition to a hybrid model (3 days from office per week).

Working Hours: 1 PM to 10 PM and 2 PM to 11 PM, 5 days a week.

Industry Preference: Healthcare background is a must have.

Responsibilities:

Deal with operational issues such as production failures, infrastructure problems, security, and monitoring.
Ensure the availability, performance, and scalability of a website or application.
Work closely with developers to identify and fix potential issues before they cause problems for users.
Monitor systems and create plans for responding to incidents.
Involved in capacity planning and performance tuning to ensure that the site can handle increased traffic without issue.
Have a deep understanding of how distributed systems work in order to troubleshoot and optimize them.
Familiar with various monitoring tools such as AppDynamics, Splunk, and GCP Operations Suite.
Understand how different types of databases work to effectively troubleshoot any issues that may arise.
Experience working with cloud-native applications to manage them effectively.
Communicate clearly and concisely about system alerts or outages to other team members.
Deal with unexpected outages or performance issues.
Experience with monitoring tools, configuration management tools, and automation tools.
Good experience in Azure and GCP.

Note: Candidate should be able to mature their SRE practice across the division and be comfortable being a champion and leader in the SRE space.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.