Research Scientist: Pretraining

locationUnited States
euro$200-350k + equity
Jake Pheasey
Consultant Name
Jake Pheasey
SECTOR
-
JOB TYPE
Permanent
DATE POSTED
11 March 2026
Research Scientist – Pretraining
Compensation: $200-350k + equity
Location: Bay Area (Onsite)
 
The Company
This is a rare opportunity to join a well-funded, research-first robotics and embodied AI lab at an early but high-momentum stage — backed by tier-1 investors and strategic partners at the forefront of AI infrastructure and compute.

The founding team carries exceptional research pedigree, with alumni from some of the world's most respected AI organisations. Between the founders, they bring tens of thousands of academic citations in robotics and large-scale ML — representing some of the deepest technical credibility in the field globally.

The company's core thesis is that scaling real-world robot data, model size, and compute can unlock predictable, general improvements in robotic capability — analogous to what foundation models achieved for language and vision. The team is executing on this with both scientific rigour and real-world deployment on physical robots.

Backed by a major compute infrastructure partner alongside leading venture firms, the lab is well-resourced and operating with serious ambition. This is not incremental robotics — it's an attempt to reset what's possible.
 
The Role
As a Research Scientist focused on Pretraining, you will own the base intelligence layer for the company's robot foundation models. Your focus will be on large-scale pretraining across multimodal robotic data — pushing generalisation across tasks, embodiments, and environments.

This is a high-ownership role with direct research-to-product impact. You will be working on foundational problems alongside some of the most respected minds in AI and robotics.
 
Responsibilities
  • Design and execute large-scale pretraining runs for robot foundation models
  • Define architectures, objectives, and training curricula (transformer and diffusion-based)
  • Build scalable data mixtures and sampling strategies across petabyte-scale datasets
  • Guide and influence data collection strategy and sourcing
  • Run ablations to understand scaling laws, data quality, and architecture trade-offs
  • Collaborate closely with ML Infra and Systems teams to maximise cluster efficiency
  • Convert raw robotic interaction data into generalisable intelligence
 
Key Skills & Experience
  • Proven experience training large transformer or diffusion models at scale
  • Hands-on ownership of multi-node, multi-GPU distributed training
  • Deep understanding of optimisation dynamics and training failure modes
  • Strong PyTorch fundamentals; comfortable debugging end-to-end
  • Excited by first-principles work on general-purpose robot intelligence
  • Strong signals: top-tier research publications, prior lab experience (OpenAI, DeepMind, Anthropic or equivalent), or evidence of building from the bottom up
 
Why This Role
The problems being solved here — building general-purpose embodied intelligence at scale — are among the most important and least-solved in AI today. You will have the autonomy to shape foundational research directions, work with some of the highest talent-density teams in robotics globally, and see your work deployed on real physical robots.

The culture is research-driven but execution-focused: deep scientific rigour combined with a relentless focus on what actually works in the real world. It is demanding, intense, and designed for people who want to work next to the best in the field.

If you are motivated by high-impact work, deep technical ownership, and the chance to help define what general-purpose robots can do — this is worth a conversation.
 
Apply
Please apply via this listing or reach out directly. All applications are handled in strict confidence. 
 

Apply now