AI71 is an applied research team dedicated to creating helpful and responsible AI agents for knowledge workers.
Working closely with our industry partners, our cross-functional teams of AI experts build products grounded in the cutting-edge research of our colleagues from the Technology Innovation Institute (TII).
Job Description:
Are you a seasoned DevOps Engineer with a passion for AI and a track record of managing deployment, automation, and maintenance of AI systems, optimizing workflows and ensuring the reliability, scalability, and security of AI infrastructure? As our DevOps Engineer at AI71, you'll play a critical role in shaping and delivering cutting-edge solutions that redefine industries and create transformative impact.
What You'll Do:
Manage and automate the deployment of AI systems to production environments.
Implement and maintain continuous integration and continuous deployment (CI/CD) pipelines.
Monitor system performance, troubleshoot issues, and ensure the reliability of production environments.
Collaborate with development teams to streamline and optimize the development workflow.
Implement and manage infrastructure as code (IaC) for scalability and consistency.
Ensure security and compliance in all aspects of the deployment pipeline.
Stay informed about emerging technologies and best practices in DevOps to drive improvements in processes.
What You'll Bring:
5+ years of experience in DevOps or a related field.
Proficiency in Python and machine learning frameworks (TensorFlow, PyTorch, scikit-learn).
Experience with data versioning and experiment tracking tools (MLflow, DVC).
Proven track record of successfully deploying machine learning models in production environments.
Experience with feature stores and model registries.
Familiarity with machine learning experiment tracking and reproducibility.
Knowledge of infrastructure-as-code tools (Terraform, CloudFormation).
Proficient in scripting languages (e.g., Shell, Bash).
Experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
Knowledge of cloud platforms (e.g., AWS, Azure, GCP).
Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI).
Strong problem-solving and troubleshooting skills.
Solid understanding of DevOps principles and practices.
Nice-to-have Experience:
Contributions to open-source ML and MLOps projects.
Experience with advanced ML model optimization techniques.
Knowledge of model interpretability and explainability tools.
Understanding of statistical modeling and machine learning algorithms.
Experience with edge ML deployment and IoT systems.
Why AI71:
Proven performance of our large language models.
Strong traction and adoption from the open-source community.
Secured proprietary data to build specialized distinctive models.
Locked large compute power to support our roadmap.
Signed anchor clients, to develop POCs and demonstrate our solutions.