Manage and lead projects related to the acquisition and integration of alternate data sources and ensure the timely delivery of project milestones and objectives
Work closely with IT to build and implement the ETL (extract, transform and load) process for improved data ingestion from various internal and external sources
Design master data management (MDM) roadmap and develop scripts for automation of standard and periodic reports as well as performing data feed into downstream systems
Ensure data quality and integrity by implementing robust data validation and cleansing processes
Prepare clear and concise documentations of the data elements and data formats needed and work with technical teams to continuously improve, refine and expand the datasets to enhance the quality and effectiveness of processes
Requirements:
A recognised Degree in Computer Science, Data Science, Mathematics, Statistics or other related disciplines with minimum 8 years of working experience in Data Infrastructure and/or Next Gen technologies
Proficient in various data programming languages (e.g. SAS, Python, SQL) as well as hands-on experience and knowledge in ETL process
Possess strong knowledge in Cloud technology and the Apache Hadoop ecosystem as well as big data querying tools such as Apache Hive and Apache Impala
Adept in handling and understanding large datasets and relational databases in upstream and downstream processes
Able to write clear and concise business requirements
Possess experience in script automations, data modelling, algorithm and data transformation techniques
Exposure in data visualisation tools (e.g. Tableau, Power BI etc) as well as knowledge in data science, artificial intelligence and machine learning would be an added advantage