Company
Global disruptor in Autonomy space raising the industry standard for drones.
Products have been received a multitude of prestigious awards.
Growing exponentially with opportunity to gain meaningful equity in lead up to IPO phase.
What you'll do
Responsible for AI framework and training optimization in cloud.
Design and implement the state of the art deep learning platforms for distributed training and inference at large scale.
Develop reliable, scalable and easy to use components to improve deep learning infra user experience and productivity.
Deep dive into the root cause of deep learning infra failures and design clean solutions to improve infra stability.
Profile deep learning training code to understand the performance bottleneck and figure out solutions to improve the performance.
Work closely with cross-functional teams to deliver new features on time with high quality.
What you'll need to succeed
Must have:
Master or Doctoral degree, more than 3 years of deep learning platform development, MLOps dev experience.
Development experience of large-scale distributed training of deep learning models in computer vision on GPU cluster and have experience in training efficiency optimization.
Deep understanding of internals of deep learning framework such as Pytorch.
Experience in GPU acceleration in distributed training and model deployment.
Familiar C++/Go/Python/Java programming language.
Nice to have:
Had ML infra/ML tooling experience in robotics field such as autonomous driving.