Research Scientist: Pretraining

United States

$200-350k + equity

Consultant Name

Jake Pheasey

SECTOR

JOB TYPE

Permanent

DATE POSTED

11 March 2026

Research Scientist – Pretraining
Compensation: $200-350k + equity
Location: Bay Area (Onsite)

The Company

This is a rare opportunity to join a well-funded, research-first robotics and embodied AI lab at an early but high-momentum stage — backed by tier-1 investors and strategic partners at the forefront of AI infrastructure and compute.

The founding team carries exceptional research pedigree, with alumni from some of the world's most respected AI organisations. Between the founders, they bring tens of thousands of academic citations in robotics and large-scale ML — representing some of the deepest technical credibility in the field globally.

The company's core thesis is that scaling real-world robot data, model size, and compute can unlock predictable, general improvements in robotic capability — analogous to what foundation models achieved for language and vision. The team is executing on this with both scientific rigour and real-world deployment on physical robots.

Backed by a major compute infrastructure partner alongside leading venture firms, the lab is well-resourced and operating with serious ambition. This is not incremental robotics — it's an attempt to reset what's possible.

The Role

As a Research Scientist focused on Pretraining, you will own the base intelligence layer for the company's robot foundation models. Your focus will be on large-scale pretraining across multimodal robotic data — pushing generalisation across tasks, embodiments, and environments.

This is a high-ownership role with direct research-to-product impact. You will be working on foundational problems alongside some of the most respected minds in AI and robotics.

Responsibilities

Design and execute large-scale pretraining runs for robot foundation models
Define architectures, objectives, and training curricula (transformer and diffusion-based)
Build scalable data mixtures and sampling strategies across petabyte-scale datasets
Guide and influence data collection strategy and sourcing
Run ablations to understand scaling laws, data quality, and architecture trade-offs
Collaborate closely with ML Infra and Systems teams to maximise cluster efficiency
Convert raw robotic interaction data into generalisable intelligence

Key Skills & Experience

Proven experience training large transformer or diffusion models at scale
Hands-on ownership of multi-node, multi-GPU distributed training
Deep understanding of optimisation dynamics and training failure modes
Strong PyTorch fundamentals; comfortable debugging end-to-end
Excited by first-principles work on general-purpose robot intelligence
Strong signals: top-tier research publications, prior lab experience (OpenAI, DeepMind, Anthropic or equivalent), or evidence of building from the bottom up

Why This Role

The problems being solved here — building general-purpose embodied intelligence at scale — are among the most important and least-solved in AI today. You will have the autonomy to shape foundational research directions, work with some of the highest talent-density teams in robotics globally, and see your work deployed on real physical robots.

The culture is research-driven but execution-focused: deep scientific rigour combined with a relentless focus on what actually works in the real world. It is demanding, intense, and designed for people who want to work next to the best in the field.

If you are motivated by high-impact work, deep technical ownership, and the chance to help define what general-purpose robots can do — this is worth a conversation.

Apply

Please apply via this listing or reach out directly. All applications are handled in strict confidence.

Research Scientist: Pretraining

Jake Pheasey

Permanent

11 March 2026

Apply now