Lead Data Ops Engineer

Emirates Global Aluminium (EGA)
United Arab Emirates
AED 120,000 - 200,000
Job description

The Lead Data Ops Engineer will play a critical role in building, scaling, automating, and maintaining the company's Big Data infrastructure and Machine Learning operations (MLOps). This individual will work in close collaboration with our Data Architect, Data Science, Data Engineering, and IT teams to ensure the development, deployment, and scale of robust, high-performance data processing systems and ML models.

KEY ACCOUNTABILITIES:

  1. Big Data Infrastructure: Design, build, and maintain high-performance, cloud-based, fault-tolerant, scalable distributed data infrastructure that supports the company's data-intensive applications (Real time/Batch/LLMs). This includes developing strategies for data storage (TBs), processing, and analysis, and implementing high-performance, scalable data pipelines for ML models and data products, supporting up to 50-60 use cases a year and thousands of IoT devices.
  2. Create infrastructure as Code, perform configuration and set up managed data services. Build and deploy a data science playground for research and prototyping for the professional and citizen data science program being rolled out and supporting 15-20 citizen data scientists/ambassadors.
  3. Machine Learning Operations (MLOps): Develop and manage the ML operational process, working closely with the data science team to implement ML models into production, including edge. This includes streamlining the ML lifecycle, from model development and testing to deployment and implementing the monitoring and alerting strategy.
  4. Automation and Scalability: Implement automation tools and frameworks to manage system updates/changes. Ensure that all systems and infrastructures can scale effectively with the increase in IoT sensors and devices.
  5. Continuous Integration and Deployment (CI/CD): Oversee continuous integration and continuous deployment practices for the data and ML pipeline, ensuring that software can be reliably released at any time.
  6. System Monitoring and Reliability: Monitor system performance and reliability to ensure high levels of performance, availability, and security. This includes identifying and fixing potential and existing system issues.
  7. Collaboration and Communication: Strong collaboration with Data Architect/Engineer, data scientists for the implementation and testing of new data services to provide an elastic data infrastructure.
  8. Security: Oversee and ensure that all Big Data and ML Ops platforms comply with the company's security standards and policies.
  9. Mentorship and Leadership: Act as a mentor to junior data members, providing guidance and support in their professional development.
  10. Innovation and Continuous Improvement: Stay up-to-date with industry trends and new technologies. Continuously explore innovative solutions and enhancements to the existing data architecture to improve its scalability, reliability, and efficiency.
  11. Problem Solving: Anticipate and resolve technical issues before they become roadblocks, maintaining the continuity of data flow and ensuring the highest levels of data quality and integrity.

AUTHORITY/DECISION MAKING:

  1. Infrastructure Design: Decide on the most effective design and implementation of the company's Big Data infrastructure.
  2. ML Ops Process: Make key decisions on the ML operational process, ensuring that ML models can be effectively integrated into production.
  3. Automation Tools: Choose the most appropriate automation tools and frameworks for the company's needs.
  4. CI/CD Practices: Determine the best practices for continuous integration and deployment in the context of the company's operations.
  5. System Monitoring: Make decisions on system monitoring strategies, including the selection of tools and responses to system performance metrics.
  6. Security Policies: Have a say in the implementation of security policies as they pertain to the Big Data and ML Ops platforms.
  7. Budget and Costing: Taking ownership of managing data platform costs and relevant data services.

QUALIFICATIONS & SKILLS:

  1. Bachelor's degree required, MS or PhD preferred.
  2. Bachelor's in Data Science, Computer Science, Engineering, Statistics and 10+ years of relevant experience.
  3. Experience: A minimum of 5-7 years of experience in a DevOps role, with a focus on managing Big Data infrastructures and MLOps.
  4. Technical Skills:
  5. Strong experience with Big Data technologies such as Hadoop, Spark, Kafka, etc.
  6. Proven expertise in managing and deploying ML models into production.
  7. Proficient in using CI/CD tools like Jenkins, Travis CI, CircleCI, etc.
  8. Proficient in using infrastructure automation tools like Terraform, Chef, Puppet, Ansible, etc.
  9. Strong knowledge of cloud platforms such as Azure (AWS, GCP).
  10. Experience with containerization technologies like Docker, Kubernetes, etc.
  11. Familiarity with various database technologies, both SQL and NoSQL.
  12. Proficiency in programming languages such as Python, Java, or Scala.
  13. Experience of leveraging MS/Azure ecosystem to manage the development and maintenance of cloud platform operations.
Get a free, confidential resume review.
Select file or drag and drop it
Avatar
Free online coaching
Improve your chances of getting that interview invitation!
Be the first to explore new Lead Data Ops Engineer jobs in United Arab Emirates