Create Spark Scala/PySpark jobs for data transformation and aggregation.
Produce unit tests for Spark transformations and helper methods.
Use Spark and Spark-SQL to read parquet data and create tables in Hive using the Scala API.
Work closely with Business Analysts to review test results and obtain sign-off.
Prepare necessary design and operations documentation for future use.
Perform peer code quality reviews and ensure compliance with quality standards.
Engage in hands-on coding, often in a pair programming environment.
Collaborate with teams to build quality code and ensure smooth production deployments.
Requirements:
4-10 years of experience as a Hadoop Data Engineer, with strong expertise in Hadoop, Spark, Scala, PySpark, Python, Hive, Impala, CI/CD, Git, Jenkins, Agile Methodologies, DevOps, and Cloudera Distribution.
Strong knowledge of data warehousing methodologies.
Minimum of 4 years of relevant experience in Hadoop and Spark/PySpark.
Strong understanding of enterprise data architectures and data models.
Experience in the core banking and finance domains.
Familiarity with Oracle, Spark streaming, Kafka, and machine learning.
Good to have cloud experience, particularly with AWS.
Ability to develop applications using the Hadoop tech stack efficiently and effectively, ensuring on-time, in-specification, and cost-effective delivery.