Work with Engineering to create Proof of Concepts, define AWS elements to be used based on business requirements.
Analyze and present comparison of multiple AWS solutions, non-AWS alternatives to executives - for final decision making.
Responsible for building and setting up new development tools and infrastructure utilizing knowledge in continuous integration, delivery, and deployment (CI/CD), Cloud technologies, Container Orchestration and Security. Build and test end-to-end CI/CD pipelines, ensuring that systems are safe against security threats.
In depth expertise with the Kubernetes stack Operators, CRDs, Policies, TLS.
Infrastructure as Code (IaC): Design and implement automated infrastructure provisioning and management using IaC tools like Terraform, Ansible, Chef, or Puppet.
Configuration Management: Use automation tools to ensure consistent configuration management across environments and ensure system reliability.
Containerization and Orchestration: Manage and scale containerized applications using Docker, Kubernetes, or other container orchestration platforms. Automate the deployment of containers to ensure scalable and reliable environments.
Cloud Infrastructure: Manage cloud infrastructure (AWS, Azure, GCP) and utilize cloud-native services to optimize cost, performance, and availability.
Monitoring and Logging:
System Monitoring: Implement monitoring solutions using tools like Prometheus, Nagios, Datadog, Zabbix, or New Relic to track system performance and application health.
Log Management: Use centralized logging solutions like ELK Stack (Elasticsearch, Logstash, Kibana) or Splunk to collect, analyze, and visualize logs for system troubleshooting, performance optimization, and security monitoring.
Alerting and Incident Management: Set up alerting systems and respond proactively to issues by using tools like PagerDuty, Opsgenie, or VictorOps to reduce downtime and enhance system reliability.
Collaboration and Communication:
Cross-Functional Collaboration: Collaborate with development, operations, and QA teams to ensure efficient software delivery pipelines, smooth infrastructure management, and alignment of goals and processes.
Mentorship and Leadership: Provide mentorship to junior DevOps engineers and promote best practices in automation, CI/CD, and system administration.
Stakeholder Communication: Act as a liaison between different technical teams and non-technical stakeholders to communicate DevOps strategies, progress, and challenges.
Performance Optimization:
System Tuning and Optimization: Continuously monitor and improve system performance, ensuring that infrastructure, applications, and services are running optimally. Work on reducing latency, improving throughput, and scaling systems as needed.
Cost Optimization: Work with cloud providers and teams to optimize costs by selecting the right resources, optimizing workloads, and leveraging appropriate services and automation tools to manage usage efficiently.
Security and Compliance:
Security Automation: Integrate security practices into the CI/CD pipeline and ensure that security measures are automated. Use tools like Vault, OWASP, and SonarQube for vulnerability scanning and compliance checks.
Access Control and IAM: Implement secure access control policies using Identity and Access Management (IAM) tools, ensure that least privilege principles are followed, and manage permissions effectively.
Compliance and Auditing: Ensure that infrastructure and deployment pipelines adhere to security policies and compliance requirements (e.g., GDPR, HIPAA, PCI-DSS).
Desired candidate profile
Technical Skills:
CI/CD Tools: Proficiency with Jenkins, GitLab CI, CircleCI, Travis CI, or similar tools for continuous integration and deployment.
Cloud Platforms: Expertise in cloud computing platforms like AWS, Microsoft Azure, or Google Cloud Platform (GCP), and experience with their associated services.
Containerization and Orchestration: Experience with Docker, Kubernetes, OpenShift, or similar tools for container orchestration.
Infrastructure as Code (IaC): Strong knowledge of IaC tools like Terraform, CloudFormation, Ansible, or Puppet.
Scripting and Automation: Proficiency in scripting languages such as Python, Bash, or Ruby for automating processes and managing infrastructure.
Monitoring and Logging: Experience with tools like Prometheus, Grafana, Datadog, Splunk, and ELK Stack to monitor system performance and manage logs.
Security Skills:
Security Best Practices: Familiarity with integrating security into DevOps processes, including vulnerability scanning, secure coding practices, and compliance monitoring.
Identity and Access Management (IAM): Experience with managing access controls and ensuring the security of cloud and on-prem systems.
Problem-Solving and Troubleshooting:
Ability to diagnose and resolve complex issues in large-scale distributed systems.
Proficient in troubleshooting at various levels (infrastructure, application, network, and services).
Collaboration and Communication:
Ability to work closely with cross-functional teams, communicate technical concepts to non-technical stakeholders, and mentor junior engineers.
Strong communication skills to ensure transparency and maintain alignment across teams.
Performance Optimization:
Ability to assess system performance, troubleshoot bottlenecks, and implement solutions for optimization.
Leadership and Mentorship:
Leading and mentoring DevOps teams, establishing best practices, and driving the adoption of new tools and processes.
Tools and Technologies Used by Senior DevOps Engineers:
CI/CD and Automation Tools:
Jenkins, GitLab CI, CircleCI, Travis CI, Azure DevOps for automating builds, tests, and deployments.
Terraform, Ansible, Chef, Puppet for infrastructure automation and configuration management.
Cloud Platforms and Services:
AWS, Azure, Google Cloud for cloud infrastructure management.
CloudFormation (AWS), ARM Templates (Azure) for infrastructure as code in the cloud.
Containerization and Orchestration:
Docker for creating containerized applications.
Kubernetes, OpenShift for container orchestration, scaling, and management.
Monitoring and Logging:
Prometheus, Grafana, Datadog for monitoring.
ELK Stack (Elasticsearch, Logstash, Kibana), Splunk for log management.
Version Control:
Git, GitHub, GitLab, Bitbucket for version control.
Security and Compliance:
Vault, SonarQube, OWASP for securing the CI/CD pipeline and code scanning.
IAM tools in cloud platforms like AWS IAM, Azure AD.
Networking and Load Balancing:
HAProxy, Nginx, Traefik for load balancing and reverse proxy configurations.
Incident Management:
PagerDuty, Opsgenie, VictorOps for alerting and incident management.