Back to jobs

Senior Research Engineer (Data)

Job description

Senior Research Engineer (Data)
$175,000 - $250,000 + Equity + Benefits + PTO
Palo Alto, CA - On-site


Are you passionate about scaling data systems that fuel state-of-the-art AI? Want to play a mission-critical role in training cutting-edge generative models by designing the data infrastructure they rely on?

This is a rare opportunity to join a top-tier AI startup as they continue to push the boundaries of what's possible in multimodal generative AI - you'll be joining a high-performing, research-driven team with significant funding and strong momentum, in a high-impact position at the intersection of research and infrastructure.

I'm working with a well-funded AI startup in Palo Alto that's scaling its Research Engineering division. They're looking for a Senior Research Engineer focused on data systems-someone who understands how critical clean, diverse, and scalable data pipelines are to generative model performance. If you're excited about building high-quality datasets and architecting systems that impact billions of tokens, this is your chance to make a huge impact.

In this role, you'll partner closely with researchers to build end-to-end data acquisition and processing pipelines. You'll source novel data types, design filtering and deduplication systems, integrate active learning techniques, and help steer research directions based on model gaps. It's a role that combines engineering, research, and strategy-at serious scale.

This is a rare opportunity to have direct technical impact in a fast-paced, research-driven environment alongside some of the brightest minds in AI, whilst continuing to progress both your technical skills and career.

The Role

  • Architect and maintain scalable pipelines for sourcing, deduplicating, filtering, and preparing massive datasets for training.
  • Partner with research scientists to identify model gaps and improve dataset relevance and diversity.
  • Collaborate with annotation ops to enhance dataset quality through smart filtering strategies.
  • Integrate self-supervised active learning and other advanced data techniques to scale systems efficiently.
  • Contribute directly to the performance of cutting-edge video generation models and other generative systems.
  • On-site in Palo Alto, CA

Ideal Candidate

  • Experience building large-scale data pipelines in domains like computer vision, NLP, robotics, or autonomous systems.
  • Strong Python skills, with familiarity in deep learning frameworks such as PyTorch.
  • Experience working with large data processing frameworks (e.g., SQL, Spark).
  • Solid understanding of distributed systems and performance-aware data infrastructure.
  • Proven track record of delivering robust data solutions in fast-paced, research-heavy environments.
  • Bonus: experience in data-centric AI, self-supervised learning, or active learning methods.