#Halian is hiring Data Engineer (Generative AI) in Al Khobar, KSA.
Greetings!
Experience required: 6+ Years
Duration: 1+ Year & Extendable
Technical Skills:
5 to 10 years of experience in Data Engineering, including building and maintaining scalable data pipelines and working with large, complex datasets.
Proficiency in Python, Java, or Scala for data processing and pipeline automation.
Experience with ETL frameworks and tools such as Apache Airflow, Luigi, or similar for workflow orchestration.
Strong experience with data storage solutions such as SQL, NoSQL, and data lakes (e.g., Hadoop, Spark, or Snowflake).
Deep understanding of data formats (e.g., JSON, Avro, Parquet) and transformation techniques.
Familiarity with Machine Learning workflows and tools such as TensorFlow, PyTorch, or Scikit-learn, and knowledge of how data integrates with AI models.
Strong problem-solving skills and ability to work in a fast-paced, collaborative environment.
Excellent communication skills.
Key Responsibilities:
Design, Develop, and Maintain Data Pipelines: Build robust, scalable, and high-performance data pipelines to support Generative AI models. Ensure efficient data collection, transformation, and integration processes that meet the performance needs of machine learning applications.
Data Management: Oversee data storage solutions, manage structured and unstructured datasets, and ensure the integrity, security, and availability of data assets. Maintain comprehensive documentation on data sources and pipeline architecture.
Collaboration with Data Scientists: Work closely with Data Scientists to understand data needs for training Generative AI models. Provide clean, well-structured, and optimized datasets to enable model development and improve model accuracy.
Partnership with Machine Learning Engineers: Collaborate with Machine Learning Engineers to integrate datasets into production environments. Assist with the deployment of models by ensuring real-time data flow, effective data storage, and seamless interaction between systems.
Optimize Data Pipelines for Scalability and Performance: Continuously monitor, analyze, and improve the efficiency of data pipelines. Implement best practices for managing large-scale data processing systems and ensure minimal downtime.
Automation and Monitoring: Automate routine tasks related to data ingestion, transformation, and monitoring. Use modern tools and frameworks to ensure smooth operation and early detection of issues in data flow.
* Interested people, share your resume to [HIDDEN TEXT]