The Kubernetes Systems Administrator - L2, is responsible for assisting with infrastructure design, service maintenance and upgrades, implementation and maintenance of supportive tooling and automation, as well as incident troubleshooting and resolution. The incumbent will work closely within the IT Operations team to continually build, enhance, and maintain our highly available on-prem Kubernetes infrastructure.
Your Job:
Assist in the design (major or incremental) of Kubernetes infrastructure
Assist in the provisioning of Kubernetes infrastructure
Identify, troubleshoot, diagnose, and correct systems related issues as they arise
Monitor and review dashboards, logs, and critical alerts to ensure system issues are identified, escalated and addressed as quickly as possible
Provide ongoing operational support and work closely with team members on issue resolution
Manage and administer incident, problem, and service request queues
Develop and maintain system use guides, run books, and other documentation as needed
Evaluate and determine opportunities for automation and implementation
About You:
Experienced managing Kubernetes infrastructure in self-hosted production environments
3+ years of experience as a Systems Administrator preferably within a software development environment
Proficient at installing, configuring, and administering Kubernetes distributions
Experience with Docker containers, experience with other containerization technology is an asset
Solid understanding of both general and Kubernetes networking/firewall concepts and their role with regards to Kubernetes infrastructure and application delivery
Experience with complex IT infrastructure architecture planning, design, and implementation.
Adaptability to working with multiple teams on projects with varying degrees of flexibility/rigidity at different points in the development cycle
Previous experience working in an environment with formally structured IT Operational processes: work request ticket management, incident management, change management, and problem management
Proficiency in scripting languages such as Python, Ruby, Perl, and/or Bash
Ability to develop and maintain positive working relationships
Ability to work in a team environment and independently as needed
Ability to adapt to change and work well under pressure
Ability to multitask and manage numerous projects
Ability to take on internal operational initiatives as a prime or lead
Excellent communication, organizational, interpersonal, problem-solving, and documentation skills
Experience running and supporting a global 24x7 internet-based service or product is considered an asset
Service management certifications such as ITIL is considered an asset