Data Engineer

Be among the first applicants.

Dubizzle

Dubai

AED 120,000 - 200,000

Be among the first applicants.

Today

Job description

Roles and Responsibilities

The Data Engineer intern will be participating in exciting projects covering the end-to-end data lifecycle - from raw data integrations with primary and third-party systems, through advanced data modelling, to state-of-the-art data visualisation and development of innovative data products.

You will have the opportunity to learn how to build and work with both batch and real-time data processing pipelines. You will work in a modern cloud-based data warehousing environment alongside a team of diverse, intense and interesting co-workers. You will liaise with other departments - such as product & tech, the core business verticals, trust & safety, finance and others - to enable them to be successful.

Key Responsibilities Include:

Raw data integrations with primary and third-party systems
Data warehouse modelling for operational and application data layers
Development in Amazon Redshift cluster
SQL development as part of agile team workflow
ETL design and implementation in Matillion ETL
Design and implementation of data products enabling data-driven features or business solutions
Data quality, system stability and security
Coding standards in SQL, Python, ETL design
Building data dashboards and advanced visualisations in Periscope Data with a focus on UX, simplicity and usability
Working with other departments on data products - i.e. product & technology, marketing & growth, finance, core business, advertising and others
Being part and contributing towards a strong team culture and ambition to be on the cutting edge of big data

Requirements

Bachelor's degree in computer science, engineering, math, physics or any related quantitative field
Knowledge of relational and dimensional data models
Knowledge of terminal operations and Linux workflows
Ability to communicate insights and findings to a non-technical audience
Good SQL skills across a variety of relational data warehousing technologies especially in cloud data warehousing (e.g. Amazon Redshift, Google BigQuery, Snowflake, Vertica, etc.)
Attention to details and analytical thinking
Entrepreneurial spirit and ability to think creatively; highly-driven and self-motivated; strong curiosity and strive for continuous learning

Desired Candidate Profile

Data Architecture and Infrastructure
- Designing Data Pipelines: Develop, construct, and maintain efficient data pipelines that enable the movement and transformation of large datasets between various systems and storage solutions.
- Building Data Warehouses: Create data storage solutions like data warehouses or data lakes that allow easy access, retrieval, and analysis of data from various sources (e.g., transactional databases, cloud platforms).
- Database Design and Optimization: Design databases, ensuring that they are scalable, secure, and optimized for both performance and storage.
Data Collection and Integration
- Data Integration: Integrate data from a variety of sources such as APIs, databases, flat files, cloud storage, and real-time data streams into centralized systems.
- ETL Processes (Extract, Transform, Load): Develop and maintain ETL processes to clean, transform, and load raw data into usable formats for analytics.
- Data Governance: Ensure that data is accurate, consistent, and secure by enforcing data governance practices, and maintaining data quality standards.
Data Transformation and Processing
- Data Cleaning: Process and clean raw data to remove inconsistencies, errors, or duplicates, ensuring that the data used for analysis is reliable and of high quality.
- Data Transformation: Transform data into a structured format suitable for analysis, reporting, and further processing by data scientists and analysts.
Performance and Scalability
- Optimization: Continuously monitor, optimize, and troubleshoot data pipelines and storage solutions to ensure they perform efficiently at scale, especially as data volumes grow.
- Automation: Automate repetitive tasks like data loading and monitoring to reduce manual effort and improve the efficiency of data processing.
Collaboration with Data Scientists and Analysts
- Collaborate on Data Needs: Work closely with data scientists and analysts to understand their data requirements and provide them with clean, organized, and ready-to-use datasets.
- Provide Data Access: Ensure that analysts and other users can easily access and query the data they need by setting up efficient querying tools and user interfaces.
Cloud Platforms and Big Data
- Cloud Solutions: Leverage cloud-based platforms (e.g., AWS, Google Cloud Platform, Microsoft Azure) for scalable data storage and computing resources.
- Big Data Technologies: Implement and manage big data technologies (e.g., Hadoop, Spark, Kafka) to process and analyze large datasets across distributed systems.