Responsible for creating and managing the technological part of data infrastructure in every step of data flow. From configuring data sources to integrating analytical tools — all these systems would be architected, built, and managed by a general-role data engineer.
Minimum education (essential):
Bachelor’s degree in Computer Science or Engineering (or similar)
Minimum education (desirable):
Honors degree in Computer Science or Engineering (or similar)
AWS Certified Data Engineer
AWS Certified Solutions Architect
AWS Certified Data Analyst
Minimum applicable experience (years):
5+ years working experience
Required nature of experience:
Data Engineering development
Experience with AWS services used for data warehousing, computing and transformations (e.g., AWS Glue, AWS S3, AWS Lambda, AWS Step Functions, AWS Athena, AWS CloudWatch)
Experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, DynamoDB)
Experience with SQL for querying and transformation of data
Skills and Knowledge (essential):
Strong skills in Python (especially PySpark for AWS Glue)
Strong knowledge of data modeling, schema design and database optimization
Proficiency with AWS and infrastructure as code
Skills and Knowledge (desirable):
Knowledge of SQL, Python, AWS serverless microservices
Deploying and managing ML models in production
Version control (Git), unit testing and agile methodologies
Data Architecture and Management 20%:
Design and maintain scalable data architectures using AWS services (e.g., AWS S3, AWS Glue, AWS Athena)
Implement data partitioning and cataloging strategies to enhance data organization and accessibility
Work with schema evolution and versioning to ensure data consistency
Develop and manage metadata repositories and data dictionaries
Assist and support with defining, setup and maintenance of data access roles and privileges
Pipeline Development and ETL 30%:
Design, develop and optimize scalable ETL pipelines using batch and real-time processing frameworks (using AWS Glue and PySpark)
Implement data extraction, transformation and loading processes from various structured and unstructured sources
Optimize ETL jobs for performance, cost efficiency and scalability
Develop and integrate APIs to ingest and export data between various source and target systems, ensuring seamless ETL workflows
Enable scalable deployment of ML models by integrating data pipelines with ML workflows
Automation, Monitoring and Optimization 30%:
Automate data workflows and ensure they are fault tolerant and optimized
Implement logging, monitoring and alerting for data pipelines
Optimize ETL job performance by tuning configurations and analyzing resource usage
Optimize data storage solutions for performance, cost and scalability
Ensure the optimisation of AWS resources for scalability for data ingestion and outputs
Deploy machine learning models into production using cloud-based services like AWS Sagemaker
Security, Compliance and Best Practices 10%:
Ensure API security, authentication and access control best practices
Implement data encryption, access control and compliance with GDPR, HIPAA, SOC2 etc.
Establish data governance policies, including access control and security best practices
Development Team Mentorship and Collaboration 5%:
Work closely with data scientists, analysts and business teams to understand data needs
Collaborate with backend teams to integrate data pipelines into CI/CD
Assist with developmental leadership to the team through coaching, code reviews and mentorship
Ensure technological alignment with B2C division strategy supporting overarching hearX strategy and vision
Identify and encourage areas for growth and improvement within the team
QMS and Compliance 5%:
Document data processes, transformations and architectural decisions
Maintain high standards of software quality within the team by adhering to good processes, practices and habits, including compliance to QMS system, and data and system security requirements
Ensure compliance to the established processes and standards for the development lifecycle, including but not limited to data archival
Drive compliance to the hearX Quality Management System in line with the Quality Objectives, Quality Manual, and all processes related to the design, development and implementation of software related to medical devices
Comply to ISO, CE, FDA (and other) standards and requirements as is applicable to assigned products
Safeguard confidential information and data
This job description is not a definitive or exhaustive list of responsibilities and is subject to change depending on changing business requirements. Employees will be consulted on any changes. Employee’s performance will be reviewed based on the agreed upon objectives. If you do not hear from us within 30 days, please consider your application unsuccessful.