Job description
Research Engineer (Foundation Model Development)
$175,000 - $200,000 + Equity + Benefits
Palo Alto, CA - On-site
If you're passionate about building the next generation of large-scale, multimodal generative models-and you want to train them at scale across thousands of GPUs-this is the opportunity you've been waiting for.
I'm partnering with a well-backed AI research lab in Palo Alto that's scaling its ML research team. This group sits at the bleeding edge of generative AI, driving forward the development of highly performant, scalable multimodal models using massive compute infrastructure.
This is a chance to work alongside some of the best in the industry, contribute to core research, and build models that power real-world applications. You'll play a pivotal role in algorithm design, architecture experimentation, and large-scale training optimization-all in a fast-paced, mission-driven environment.
In this high-impact, high-autonomy role you will lead and contribute to groundbreaking multimodal foundation model research through innovative algorithms, improving model performance and scalability. In addition, you will optimize models for production with a focus on efficiency, throughput, and robustness, while also analyzing and managing large data clusters to improve training pipelines.
This is a rare opportunity to have direct technical impact in a fast-paced, research-driven environment alongside some of the brightest minds in AI, whilst continuing to progress both your technical skills and career.
The Role
- Architect and execute large-scale foundation model training across thousands of GPUs.
- Prototype and experiment with advanced generative AI architectures and algorithms.
- Optimize training throughput, model robustness, and system efficiency on distributed infrastructure.
- Analyze data bottlenecks, I/O optimization, and training pipeline scaling.
- Collaborate to deliver next-gen model capabilities.
Ideal Candidate
- Strong programming experience in Python and PyTorch.
- Proven ability to build machine learning models from scratch, with a deep grasp of Transformer-based architectures and generative models (e.g., Diffusion, GANs).
- Experience running models across 100+ GPUs in distributed environments.
- Comfortable working on Linux clusters with scripting and tooling to manage massive compute.
