Enable job alerts via email!

Staff Machine Learning Architect - Assembly Coding and Performance Engineer

Arm

Cambridge

On-site

GBP 60,000 - 80,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An established industry player is seeking a passionate engineer to join their dynamic team focused on high-performance machine learning workloads. This role involves rapid prototyping of optimized CPU kernels to enhance model performance and accuracy, directly influencing future CPU architecture development. With a commitment to fostering a diverse and inclusive workplace, this innovative company offers an attractive relocation package and encourages talent to thrive. If you have a strong background in CPU architecture and kernel implementation, this is an exciting opportunity to contribute to groundbreaking projects in a collaborative environment.

Benefits

Attractive relocation package
Dynamic and inclusive workplace
Opportunities for professional growth

Qualifications

  • 4+ years experience in high-performance CPU kernel implementation.
  • Strong understanding of CPU architecture and performance metrics.

Responsibilities

  • Analyze ML workloads and prototype optimized CPU kernels.
  • Drive model performance and accuracy through kernel development.

Skills

High-performance kernel code implementation
CPU kernel experience
Performance measurement and analysis
Kernel code development framework
Deep understanding of CPU architecture

Education

Bachelor's degree in Computer Science or related field
Advanced degree in Computer Architecture or Software

Tools

Development tools for kernel optimization

Job description

Job Overview:

High-performance ML workloads on Arm CPUs requires the co-development of algorithms and highly optimized CPU kernels. In CT-ML (Central Technology, Machine Learning), rapid kernel prototyping is crucial for exploring algorithms and assessing trade-offs between model accuracy and performance. Successful prototypes are essential to drive future CPU architecture development and also deliverables to Central Engineering for final production.

Responsibilities:

This position is part of a dedicated team within the CT-ML group to focus on analyzing ML workload, rapid prototyping of highly optimized CPU kernels to drive model performance and accuracies.

Required Skills and Experience:
  • Strong interest and passion for implementing high-performance kernel code in a dynamic environment.
  • 4+ years experience in implementing high performance CPU kernel with vector and matrix extensions.
  • Experience measuring and understanding performance.
  • Experience in creating an efficient kernel code development framework including tools and testing.
  • Deep understanding of CPU architecture.
“Nice To Have” Skills and Experience:
  • Knowledge of ML models and algorithms is a plus.
  • Advanced degree or equivalent experience in Computer Architecture and Software are a plus.
In Return:

Arm is committed to global talent acquisition, offering an attractive relocation package. With offices around the world, Arm is a diverse organization of dedicated, creative and highly talented engineers. By enabling a dynamic, inclusive, meritocratic, and open workplace, where all our people can grow and succeed, we encourage our people to share their unrivaled contributions to Arm's success in the global marketplace.

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.