What to Expect
As a Software Engineer within the Autopilot AI/ML Infrastructure team, you will work reinforcing, optimizing, and scaling our neural network training & auto-labeling infrastructure both for Autopilot and the Humanoid robot.
At the core of our autonomy capabilities are multiple neural networks that the Deep Learning team is designing to train on very large amounts of data, across large-scale GPU clusters and soon our supercomputer Dojo.
Robustly training networks at scale, should it be for production models or quick experiments, and completing them in the shortest amount of time possible, is critical to our mission.
We are building out the machine learning platform that our team uses to schedule, track, and monitor their jobs, datasets and artifacts.
What You’ll Do
Connect our machine learning code to databases, and these databases to the frontend.
Our machine learning engineers and leadership use this stack to schedule, launch, monitor, track and debug experiments, jobs and models.
Work on the platform of tools and infrastructure of whatever the machine learning team needs to be effective.
This spans the scope from the machine learning code in Python to back-end and front-end work in JavaScript.
Coordinate required hardware resources with the team managing the cluster hardware to maintain high availability.
Work with the Machine Learning team directly to understand requirements and priorities.
What You’ll Bring
Strong knowledge of Python, React, and Linux.
Solid understanding of security principles and best practices
Experience working with backend infra (SQL, Redis, message brokers, etc.
)
Experience building modern web applications using Flask/Django and React/Redux or similar component based libraries.
UI and graphic design sensibilities
Experience deploying services on Kubernetes and setting up CI/CD flows
Experience working with HPC clusters is a plus.
Knowledge of machine learning, computer vision, or neural networks is a plus.