Lead System Administrator

KAUST (King Abdullah University of Science and Technology)
Jeddah
SAR 120,000 - 180,000
Job description

Position Summary

Serve as the Lead for the team ensuring smooth operation of the Linux cluster consisting of 300+ GPU/CPU compute nodes including parallel filesystems and high-performance network. This is partly technical and partly people leading role which involves supervision of 3-4 experienced HPC system administrators. The role involves development, implementation and supervision of standard operating procedures for the system and the team.

Major Responsibilities

  1. System operation and upgrade planning to meet laboratory and customer requirements
  2. Workload scheduler policy development and implementation
  3. Support of high-performance filesystems
  4. Network infrastructure management including TCP/IP and HPC networks
  5. Use of scripting languages for nodes automation and configuration management
  6. Hardware failures and spare part management
  7. Build effective relationships with staff, faculty and students through the Core Labs
  8. Manages multiple or significant projects which may require the use of sophisticated project planning techniques
  9. Plans, schedules, conducts, or coordinates detailed phases of the work of a major project or in a total project of moderate scope
  10. Identifies technical training needs for staff attached to the area
  11. Serve as a resource and as a member to respond to security and safety incidents
  12. Creates opportunities to enhance technical methodology or content through expansion of existing, or development of, new efforts; may extend technology into new application areas; contributes or leads in major intellectual development activities
  13. Provides innovative problem-solving approaches to enhance organizational capabilities; uses peer network to expand technical capabilities and identify new research opportunities
  14. Understands broad strategic objectives and contributes to them; nurtures and maintains relationships with major customers
  15. May initiate new project concepts; develops technical proposals and makes presentations to potential customers
  16. Will supervise several scientists, engineers or technicians on assigned work; provides major input to staffing of overall project teams; builds teams and staff to optimize efficiency and cost effectiveness
  17. Identifies and evaluates candidates for open positions; mentors/trains staff in development of technical, project and business development skills

Competencies

  1. SLURM workload manager including GPU scheduling
  2. Parallel filesystems (Weka IO, Lustre)
  3. TCP/IP and high performance networks (Infiniband)
  4. Proficient in scripting languages (i.e. Bash, Python, Ruby)
  5. Familiar with configuration management tools (Puppet)
  6. Proficient documentation skills
  7. Will have working level contact with users and suppliers
  8. Demonstrates an analytical and systematic approach to problem solving
  9. Takes the initiative in identifying and negotiating appropriate development opportunities
  10. Demonstrates effective communication skills in written and oral English
  11. Works effectively with other teams in the Supercomputing Laboratory
  12. Plans, schedules and monitors own work (and that of others) competently within limited deadlines and according to relevant legislation and procedures
  13. Ability to work successfully in a highly collaborative research environment
  14. Uses discretion in identifying and resolving complex problems and assignments
  15. Performs a broad range of work, sometimes complex and non-routine, in a variety of environments
  16. Maintain expert-level knowledge in most of the laboratory systems, including high performance computing systems administration, high performance storage administration, or high performance network administration

Qualifications and Experience

  1. Bachelor of Science (or equivalent) in a relevant discipline plus 10 years’ experience, OR Master of Science (or equivalent) in a relevant discipline plus 7 years’ experience OR Doctor of Philosophy (or equivalent) in a relevant discipline plus 5 years’ experience.
Get a free, confidential resume review.
Select file or drag and drop it
Avatar
Free online coaching
Improve your chances of getting that interview invitation!
Be the first to explore new Lead System Administrator jobs in Jeddah