Job description

Senior Machine Learning Engineer (Distributed Systems)
$175,000 to $250,000 + Equity + Benefits + PTO
Palo Alto, CA - on site

Are you a world-class engineer with a passion for scaling high-performance systems? Do you want to work at the cutting edge of generative AI infrastructure and help build the backbone of tomorrow's foundation models?

This is an incredible opportunity to work on the cutting edge of multimodal machine learning, whilst benefiting from strong equity (significantly above market average), and benefiting from excellent internal progression.

I'm working with a very well-funded AI startup with strong revenues that is expanding its top-tier Research Engineering team. They're focused on rethinking how multimodal foundation models are trained-pushing the limits of distributed computing, GPU efficiency, and end-to-end optimization.

As a Distributed Systems Engineer, you'll collaborate with research scientists to develop and scale core infrastructure that trains next-gen models on multi-thousand GPU clusters. You'll tackle real-world performance bottlenecks, design resilient distributed systems, and optimize everything from custom CUDA kernels to model inference pipelines.

This is a rare opportunity to have direct technical impact in a fast-paced, research-driven environment alongside some of the brightest minds in AI, whilst continuing to progress both your technical skills and career.

The Role

Architect and scale infrastructure for training large-scale models across massive GPU clusters
Optimize training performance and hardware utilization end-to-end (Python, PyTorch, CUDA, Triton)
Build systems for efficient workload distribution, fault tolerance, and job recovery
Deploy optimized inference systems with a focus on throughput and low-latency
Contribute to prototyping next-gen applications in multimodal generative AI
On-site in Palo Alto, CA

Ideal Candidate

Experience working with large-scale ML systems or high-performance computing
Strong Python and PyTorch engineering background; deep understanding of training pipelines
Proficient in distributed frameworks (DDP, FSDP, tensor/model parallelism)
Expertise in GPU/CPU performance profiling (e.g., Nsight), CUDA and Triton optimization, and custom kernel development
Deep understanding of distributed systems and frameworks, such as DDP, FSDP, and tensor parallelism
Strong generalist software engineering skills (e.g., C++, debugging, systems design)
Bonus: experience with generative models (Transformers, Diffusion, GANs), and fast prototyping tools (Gradio, Docker)

Consultant

Luca Browning

Recruitment Consultant

Senior Distributed Systems Engineer

Job description

Let's Talk

Quick Links

Contact Us

Accreditations & Certifications

Follow Us