Junior Site Reliability Engineer, Azure Cloud Platform - 30099

TN United Kingdom

London

GBP 35,000 - 55,000

Job description

Social network you want to login/join with:

Junior Site Reliability Engineer, Azure Cloud Platform - 30099, London

Client:

Splunk Inc

Location:

London, United Kingdom

Job Category:

Other

EU work permit required:

Yes

Job Reference:

aeffd6cbcb56

Job Views:

105

Posted:

14.03.2025

Expiry Date:

28.04.2025

Job Description:

Splunk is here to build a safer and more resilient digital world. The world's leading enterprises use our unified security and observability platform to keep their digital systems secure and reliable. While customers love our technology, it's our people that make Splunk stand out as an amazing career destination and why we've won so many awards as a best place to work. If you become a Splunker, we want your whole, authentic self, what we call your "million data points". So bring your work experience, problem-solving skills and talent, of course, but also bring your joy, your passion and all the things that make you, you.

Role Summary

The Cloud organization at Splunk focuses on building and maintaining robust and resilient platform solutions for SaaS hosting of Splunk’s enterprise software. Our main technologies are Cloud Infrastructure based, focusing on puppet and terraform. It is the responsibility of the TechOps team to monitor and resolve issues that affect the availability and performance of Splunk for our cloud customers 24/7. As the authority on our customer’s experience, the TechOps team provides a backstop for all staff on any questions or issues that arise during their shift related to their technical area of expertise. The TechOps engineers lead their respective queue and ensure all requests coming into that queue are addressed in a timely manner.

What you'll get to do

4 x 10 shifts: Sunday - Wednesday or Wednesday - Saturday / Nights, weekends

Provide technical support for the Splunk Cloud fleet.
Perform impact assessments and problem solving according to established procedures.
Document issues, remediation steps, and help with follow-up problem management.
Lead support cases and also ensure queue management.
Assist other TechOps engineers on your shift with complex tasks.
Represent the TechOps team in meetings/process changes and make recommendations on new procedures/processes.
Use the internal tools to restore normal service operations as quickly as possible to minimize the impact to business operations during escalated incidents.
Lead by example and drive the core values of the company.
Always ensure a quality customer experience.
Ability to work nights, weekends, and swing shifts.
You love large complex systems. You have experience in working on distributed systems or a passion for finding edge cases that appear at scale. You're interested in how to bring something from a small one-off task to how to implement it across several thousand machines at once. “Can this process be automated?” is a question you constantly ask yourself. Data drives your decisions and excites you - you make decisions based on numbers rather than assumptions. If an issue arises, you strive to be alerted before our customers notice.

Must-have Qualifications

Knowledge of the Linux Operating System.
Knowledge of Azure.
Knowledge of OOP such as Python OR Golang.

Nice-to-have Qualifications

We’ve taken special care to separate the must-have qualifications from the nice-to-haves. “Nice-to-have” means just that: Nice. To. Have. So, don’t worry if you can’t check off every box. We’re not hiring a list of bullet points–we’re interested in the whole you.

Understanding of monitoring and troubleshooting Splunk environments.
Understanding of administering or architecting distributed Splunk environments.
Understanding of the development and deployment of a hosted cloud environment, preferably Azure.
Experience with Bash or Powershell for scripting, and Git or similar version control systems.
An understanding of Systems programming (network stack, file system, OS services) and networking (L2 vs. L3, network architecture, VLANs, etc).
Knowledge of standard methodologies related to security, performance, and disaster recovery.
Ability to obtain a US favorably adjudicated Single Scope Background Investigation (SSBI).