DevOps - SRE (Site Reliability Engineering)

Encora Inc.
Manaus
Teletrabalho
BRL 60.000 - 100.000
Descrição da oferta de emprego

DevOps - SRE (Site Reliability Engineering)

Description

Important Information

Location: Brazil

Job Mode: Full-time

Work Mode: Work from home

Job Summary

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems.

Responsibilities and Duties

  1. Utilize software tools and automated tasks for continuous monitoring and reliability of applications;
  2. Act swiftly in response to emergency situations impacting system reliability in production environments, performing root cause analysis for ongoing incidents;
  3. Oversee and streamline change management processes to enhance system performance and reliability;
  4. Ownership of releases to production environments;
  5. Work closely with development teams throughout the software lifecycle, focusing on solving system-related issues and eliminating toil by automating routine tasks for enhanced productivity;
  6. Focus on the reliability and scalability of systems, ensuring high performance and efficiency standards.

Essential Skills

  1. Proficiency in monitoring tools like Azure Monitoring, App Insights, Prometheus, Grafana.
  2. Project tracking and version management with tools like JIRA, SVN, GitHub;
  3. Expertise with Infrastructure as Code (Terraform, ARM/Bicep, Pulumi, etc.) and release management tooling (ArgoCD, Harness, Octopus, etc.);
  4. Experience in incident alert tools (PageDuty, Opsgenie), and container orchestration tools like Kubernetes, AKS, and similar.

About Encora

Encora is the preferred digital engineering and modernization partner of some of the world’s leading enterprises and digital native companies. With over 9,000 experts in 47+ offices and innovation labs worldwide, Encora’s technology practices include Product Engineering & Development, Cloud Services, Quality Engineering, DevSecOps, Data & Analytics, Digital Experience, Cybersecurity, and AI & LLM Engineering.

At Encora, we hire professionals based solely on their skills and qualifications and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.

Important: Please apply with your CV in English. Luxoft only operates under CLT contracts.

Project Description

Do you like to work with existing and new software product development teams? This position is to instrument end-to-end observability and visibility for business-critical systems with log ingestion, metrics, and traces. You will function as a site reliability engineer (SRE) that will collaborate with product teams, infrastructure, and other stakeholders.

Job Overview

We are currently seeking a DataDog Engineer / Site Reliability Engineer with strong expertise in DataDog, APM monitoring, and observability for an exciting opportunity with one of our clients. As part of this project, you will be responsible for improving the performance, reliability, and overall health of critical systems.

Escalation Engineer/Site Reliability Engineer Role Responsibilities

  1. Investigate, troubleshoot and diagnose incidents;
  2. Provide first-line investigation and diagnosis of incidents and Service Requests;
  3. Be the Incident coordinator for operational incidents on the core engineering production platform. This includes all technical internal communications.
Obtém a tua avaliação gratuita e confidencial do currículo.
Seleciona o ficheiro ou arrasta e larga-o
Avatar
Coaching online gratuito
Melhora as tuas possibilidades de receberes um convite para entrevista!
Torna-te numa das primeiras pessoas a explorar as novas ofertas de DevOps - SRE (Site Reliability Engineering) em Manaus