Join our forward-thinking data team to drive the development of cutting-edge data solutions for Generative AI applications. We leverage data to empower our customers, fuel our own innovation, and support groundbreaking research. As a Senior Data Engineer, you will take a leadership role in designing, optimizing, and scaling our data infrastructure, ensuring that our solutions are not only robust but also capable of meeting the challenges of tomorrow.
Lead the design and implementation of scalable data architectures to handle large-scale datasets (terabytes of text) with a focus on storage, versioning, and documentation best practices.
Architect, develop, and oversee the maintenance of web services that enable efficient consumption of harvested data.
Partner with researchers, software engineers, and leadership to continuously refine data collection methodologies and identify new data opportunities.
Strategize and prepare large datasets for diverse Machine Learning use cases, with an emphasis on Generative AI.
Build, optimize, and automate advanced preprocessing pipelines tailored to specific applications, ensuring high performance and reliability.
Ensure data services are robust, scalable, and meet the needs of cross-functional teams developing new products on top of our data infrastructure.
Mentor and guide junior data engineers, fostering a culture of knowledge sharing and continuous improvement.
You have 7+ years of experience as a Data Engineer, with a proven track record of architecting large-scale data systems.
You are an expert in Python and proficient in at least one other programming language (e.g., Java, Scala, or Golang).
You possess a deep understanding of distributed systems, with a demonstrated ability to design and manage efficient data pipelines in both cloud and on-prem environments.
You have a strong software engineering background, with a focus on writing clean, maintainable, and well-documented code.
You excel at data wrangling, including advanced techniques for extracting, transforming, cleaning, and standardizing data from multiple sources.
You bring expertise in Generative AI use cases and understand the pivotal role of data in developing cutting-edge AI solutions.
You have experience in driving projects, influencing stakeholders, and aligning data strategies with broader business goals.
Experience working in multi cloud environments (e.g., GCP, Azure, AWS) as well as on-premise data solutions.
Background in Machine Learning or Data Science, with a particular focus on applying data engineering principles to AI research.
Proficiency in Golang and an interest in adopting new technologies.
Familiarity with Kubernetes for container orchestration and managing scalable deployments.