AI Platform Development Engineer

Location United States of America
Contact name: Sam Jacobs

Contact email: sam@deepabacus.com
Job ref: 99
Published: 17 days ago

Company

  • Global disruptor in Autonomy space raising the industry standard for drones.

  • Products have been received a multitude of prestigious awards.

  • Growing exponentially with opportunity to gain meaningful equity in lead up to IPO phase.

What you'll do

  • Responsible for AI framework and training optimization in cloud.

  • Design and implement the state of the art deep learning platforms for distributed training and inference at large scale.

  • Develop reliable, scalable and easy to use components to improve deep learning infra user experience and productivity.

  • Deep dive into the root cause of deep learning infra failures and design clean solutions to improve infra stability.

  • Profile deep learning training code to understand the performance bottleneck and figure out solutions to improve the performance.

  • Work closely with cross-functional teams to deliver new features on time with high quality.

What you'll need to succeed

Must have:

  • Master or Doctoral degree, more than 3 years of deep learning platform development, MLOps dev experience.

  • Development experience of large-scale distributed training of deep learning models in computer vision on GPU cluster and have experience in training efficiency optimization.

  • Deep understanding of internals of deep learning framework such as Pytorch.

  • Experience in GPU acceleration in distributed training and model deployment.

  • Familiar C++/Go/Python/Java programming language.

  Nice to have: 

  • Had ML infra/ML tooling experience in robotics field such as autonomous driving.