Enable job alerts via email!

Site Reliability Engineer

TN United Kingdom

London

On-site

GBP 150,000 - 200,000

Full time

6 days ago
Be an early applicant

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

Join a next-generation telecoms software company as a Network Autonomy Engineer. This role focuses on enhancing the reliability and performance of systems, driving the evolution of DevOps practices, and collaborating with various teams to ensure seamless integration and deployment. You will be at the forefront of implementing best practices in Site Reliability Engineering, utilizing your expertise in tools like Git and Kubernetes. This position offers an exciting opportunity to work in a fast-paced environment where your contributions will directly impact the success of innovative telecom solutions.

Qualifications

  • Experience with SRE best practices and GitOps tools.
  • Expertise in Kubernetes and cloud technologies.

Responsibilities

  • Develop and promote SRE culture and best practices in the team.
  • Monitor and improve application performance and stability.

Skills

SRE best practices
Git
GitOps
Logging solutions
Monitoring solutions
Cloud knowledge
Kubernetes
Non-functional testing
Communication skills
Telecommunications knowledge

Job description

We are hiring for a next generation telecoms software company who are seeking a Network Autonomy Engineer to join their expanding team.

Primary Function of the Position

Reporting to the Site Reliability Engineer Team Lead, the Site Reliability Engineer will be responsible for ensuring the reliability, scalability and performance of our systems.

Responsibilities include:

  • Develop the Site Reliability Engineering culture across the team by applying best practices, approaches and code.
  • Apply automation and propose/implement software to any tasks or parts of the system that would deliver benefit.
  • Monitor application performance – identifying, and implementing, improvements to application performance and stability.
  • Collaborate with the design and implementation of the desired pipelines and process for deployment to production environment.
  • The SRE will work closely with Platform and Software domains to ensure continuous improvement of performance and stability whilst adhering to standards.
  • Undertake ad-hoc projects and other activities as required.

Key Accountabilities and Activities

Contribute to the SRE function including:

  • Drive evolution of the DevOps / GitOps toolchain, promoting improvements to streamline the software delivery process and showing improvements through metrics.
  • Accountable for halting or stopping a project/product if the solution is not technically acceptable.
  • Responsible for producing and maintaining documentation relating to application design, integration processes, testing procedures, and deployment approach as well as collaborating with teams to create operational run and playbooks.

Integration with Domains including:

  • Collaborating with Domains to plan, design, test and maintain the application.
  • Design patterns for any component or structure under SRE responsibility.
  • Implementation of components such as Monitoring and Logging.
  • Manage the runbook preparations of Domains.

Liaise and support other teams on work items including:

  • Developing, refining, and tuning integrations between application elements.
  • Collaborate with stakeholders in the Enterprise, Solution and Development teams to produce and maintain standards and guidelines.
  • Knowledge sharing and education of team members across the organisation.
  • Act as first point of contact for the Problem management and Process Outcomes team.

Build and guide successful SRE efforts including:

  • Analysing and resolving technical and application issues.
  • Researching and evaluating software products.
  • Evaluate risks and defects, analysing specifications, and customising applications for specific customer needs.
  • Identify complex and manual processes and work to simplify and automate them.
  • Continuously review capabilities and roles critical to evolving DevOps and quality assurance practices and be responsible for the acquisition, development, and maturity of these.
  • Minimising outages by continuous improvement.

Experience and Skills

  • Experience and demonstrable knowledge of SRE best practices
  • Expert in Git and Gitops
  • Expert in logging and monitoring solutions (Prometheus, Grafana etc.)
  • Demonstrable knowledge of Cloud
  • Expert knowledge of Kubernetes
  • Proficient ability to communicate in English (Written and Verbal)
  • Understanding of non-functional testing
  • Proven ability to work independently and collaboratively in a fast-paced technical environment.
  • Demonstrable knowledge of the telecommunications industry and technologies.
  • Proven experience and ability to provide support to direct reports.
Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.