Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
Experience with Technical Observability – Dynatrace or Datadog tool would be very useful. Also, experience with distributed traces will be beneficial
Collaborates with other software engineers and team to design and implement deployment approaches using automated continuous integration and continuous delivery pipeline
Collaborates with other software engineers and teams to design, develop, test and implement availability, reliability, scalability and solutions in their applications
Implement infrastructure, configuration and network as code for the application and platform in your remit
Collaborates with technical experts, key stakeholders and team members to resolve complex problems
Understand service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
Support the adoption of site reliability engineering best practices within your team
Required Qualifications, Capabilities and Skills:
Proficient in Site Reliability culture and principles and familiarity on how to implement site reliability within application or platform
Have developer background and proficient in JAVA Backend Engineering (Core JAVA, Microservices and Spring framework)
Proficient knowledge of software applications and technical processes within a given technical discipline (e.g. Cloud, Artificial Intelligence, Android etc)
Experience in Observability such as white and black box monitoring, service level objective alerting and telemetry collections using tools like Grafana, Dynatrace, Prometheus, Datadog, Splunk and others
Experience with CI/CD tools like Jenkins, GitLab and terraform
Familiarity with Container and container orchestration such as ECS, Kubernetes and Dockers
Familiarity in troubleshooting common networking technologies and issues