HPC Cluster System Engineer

United Arab Emirates University
Al Ain
AED 50,000 - 200,000
Job description

Job Description

The HPC cluster systems engineer is responsible for managing and supporting all HPC systems and Grid systems for the University data center and distributed locations.

Duties and Responsibilities:

  1. Solves HPC and Grid related problems on a daily basis.
  2. In support of change management within the data center, provides the CSC with information about the HPC systems.
  3. Daily verifies all HPC Systems by using the monitoring tools and proactively intervenes to solve problems.
  4. Analyze solutions components, understand systems integration challenges and identify technology gaps.
  5. Resolve / propose solutions to above gaps to reach future performance targets and functionality requirements.
  6. Prototype features and perform integration checkout of various software components, and collaborate with component developers and solutions architects.
  7. Develop / drive validation test content and evaluate systems components.
  8. Engage with industry partners as required to identify and investigate best-known methods used in the HPC community and apply those methods.
  9. Collaborate with architects and developers to define architectural requirements for high-end HPC clusters.
  10. Responsible for system integration and validation of UAEU HPC clusters.
  11. Responsible for monitoring all HPC and Grid services.
  12. Coordinates work with vendors for support.
  13. Tests and deploys HPC systems.
  14. Knowledge of IT Service Management frameworks.
  15. Maintains accurate and comprehensive documentation diagrams of the enterprise HPC system, backup infrastructure, communications flow, and routing.
  16. Other duties as assigned.

Minimum Qualification:

  • Bachelor degree required in Computer Engineering/Science.
  • 3+ years of experience with software development in Linux.
  • 3+ years of experience with HPC clusters and systems integration.

Preferred Qualification:

  • Knowledge of server hardware components, diagnostics and replacing defective items.
  • Good communication skills & report writing skills.
  • Must be able to work under pressure in a fast-paced work environment.
  • Must be able to work flexible hours including evenings, weekends, holidays, and overtime as required; should be available 24/7 on-call in case of major services outage.
  • Strong problem solving, testing, and network troubleshooting skills.
  • Cluster solutions integration and administration.
  • Linux operating systems and OS components for HPC clusters.
  • Cluster provisioning, systems management, resource management middleware.
  • Cluster interconnect fabrics and software stack.
  • HPC Cluster storage solutions.
  • Parallel programming models for HPC clusters.

Close Date: Kindly apply before the closing date. Open until filled.

Get a free, confidential resume review.
Select file or drag and drop it
Avatar
Free online coaching
Improve your chances of getting that interview invitation!
Be the first to explore new HPC Cluster System Engineer jobs in Al Ain