Engage in the development and optimization of large-scale pre-training language models, including model architecture design, parallel training strategies, and performance improvements.
Drive research and implementation of advanced LLM post-training techniques, including chain-of-thought tuning, preference alignment, and RL for reasoning.
Develop and optimize data collection pipelines for model training, including data de-duplication, cleaning, and verification.
Design and implement solutions for model deployment, including inference optimization and scaling strategies.
Collaborate with cross-functional teams to apply LLM capabilities in various business scenarios, such as materials science.
Stay current with the latest developments in the field and contribute to the company's technical roadmap.
Qualifications
Master's or Ph.D. in Computer Science, AI, or related field.
5+ years of experience in machine learning, with specific focus on NLP and LLMs.
Strong understanding of transformer architectures and modern LLM frameworks(BERT, GPT, T5).
Extensive experience with deep learning frameworks (PyTorch, TensorFlow, JAX).
Strong programming skills in Python and proficiency with ML tools (Hugging Face, DeepSpeed).
Proven track record in training and optimizing large-scale language models (10B+ parameters) is preferred.
Experience with distributed training systems (Megatron) and optimization techniques is preferred.