Onsite
We're Hiring: Site Reliability Engineer (Observability)!
We are seeking a skilled Site Reliability Engineer with a focus on observability to join our dynamic team. The ideal candidate will have a solid understanding of monitoring, incident response, and performance optimization to ensure the reliability and availability of our services.
What You'll Do:
Implement and maintain observability tools and practices
Monitor system performance and troubleshoot issues proactively
Automate operational processes for efficiency and scalability
Collaborate with development teams to enhance application resilience
Analyze logs and metrics to identify improvement areas
Participate in incident management and post-mortem analysis
What We’re Looking For:
2+ years of experience in site reliability engineering or related field
Proficiency in monitoring tools such as Prometheus, Grafana, or similar
Strong scripting skills in languages like Python or Bash
Familiarity with cloud platforms (AWS, GCP, Azure)
Excellent problem-solving abilities and attention to detail
Effective communication skills for cross-team collaboration
Ready to make an impact? Apply now and let’s grow together!