Responsible for the planning, design, and deployment of enterprise IT infrastructure (including operating systems, servers, storage, virtualization, etc.).
Optimize system architecture according to business needs, improve system performance, stability, and scalability.
- Responsible for daily operation and maintenance, performance tuning and troubleshooting of Linux/Windows system.
Monitor system resource usage, analyze system bottlenecks, optimize configuration, and improve resource utilization.
Design and develop operation and maintenance automation tools to improve system management efficiency and reduce manual operation risks.
Participate in CI/CD process optimization, achieve automated deployment and system updates.
Develop and implement system emergency plans, respond quickly and solve sudden system failures.
Analyze the root cause of the fault in depth, form a complete report and optimization plan, and continuously improve system stability.
Responsible for system security reinforcement, vulnerability repair, and permission management to ensure system compliance and data security.
Regularly perform security audits, backup and recovery drills to enhance system protection capabilities.
- Write and maintain technical documents such as system architecture diagrams, operation and maintenance manuals, and fault handling processes.
Provide system usage and troubleshooting training for the development team and other relevant personnel.
1. Education and experience:
Bachelor's degree or above, major in computer science, information engineering, communication engineering, or related fields.
More than 5 years of experience as a system engineer or related position, with experience in managing large-scale distributed systems preferred.
2. Technical ability:
Proficient in Linux (CentOS, Ubuntu, Red Hat) with in-depth system tuning experience.
Familiar with the deployment and maintenance of common Middleware (such as Nginx, Tomcat, Redis, MySQL, etc.).
Familiar with containerization technologies such as Docker and Kubernetes, with experience in using CI/CD tools such as Jenkins and GitLab CI.
Proficient in programming languages such as Shell, Python, and Go, with the ability to automate operations and maintenance.
3. Problem-solving and emergency response capabilities:
Possess strong system debugging and problem analysis abilities, able to quickly locate and solve complex problems.
4. Communication and Collaboration:
Possess good communication skills and teamwork spirit, able to work closely with development, network, and security teams.
Have more than 2 years of team leadership experience.
Priority will be given to those who hold relevant certifications such as RHCE, MCSE, AWS/GCP/Azure.
Candidates with experience in high availability system (HA) design and operation are preferred.