Behavior Foundation Model for Humanoid Robots

Behavior Foundation Model
for Humanoid Robots

¹Peking University, ²The Chinese University of Hong Kong, Shenzhen ³Shanghai Jiao Tong University ⁴Fudan University ⁵Shanghai Artificial Intelligence Laboratory

Abstract

Whole-body control (WBC) of humanoid robots has witnessed remarkable progress in skill versatility, enabling a wide range of applications such as locomotion, teleoperation, and motion tracking. Despite these achievements, existing WBC frameworks remain largely task-specific, relying heavily on labor-intensive reward engineering and demonstrating limited generalization across tasks and skills. These limitations hinder their response to arbitrary control modes and restrict their deployment in complex, real-world scenarios. To address these challenges, we revisit existing WBC systems and identify a shared objective across diverse tasks: the generation of appropriate behaviors that guide the robot toward desired goal states. Building on this insight, we propose the Behavior Foundation Model (BFM), a generative model pretrained on large-scale behavioral datasets to capture broad, reusable behavioral knowledge for humanoid robots. BFM integrates a masked online distillation framework with a Conditional Variational Autoencoder (CVAE) to model behavioral distributions, thereby enabling flexible operation across diverse control modes and efficient acquisition of novel behaviors without retraining from scratch. Extensive experiments in both simulation and on a physical humanoid platform demonstrate that BFM generalizes robustly across diverse WBC tasks while rapidly adapting to new behaviors. These results establish BFM as a promising step toward a foundation model for general-purpose humanoid control.

BFM Implementation

A proxy agent is first trained in simulation via large-scale motion imitation as preparation of the large-scale behavioral datasets for BFM pretraining.

The BFM is then implemented as a Conditional Variational Autoencoder with a versatile control interface based on our mathematical formulation and leverage the paradigm of masked online distillation for pretraining.

Behavior Foundation Model
for Humanoid Robots

Video

Abstract

Behavior Gallery

Swimming

Walk and Sit on the Ground

Walk and Sit on the Chair

Get up from the Ground

Roundhouse Kick

Basketball Layup

Forward Roll

Butterfly Kick

Cartwheel

Side Salto

Simple Locomotion with Remote Controller

Simple Locomotion with Remote Controller

Squatting Locomotion with Remote Controller

Whole-body Teleoperation with HybrIK

Whole-body Teleoperation with HybrIK

BFM Implementation

BFM Applications

Behavoior Composition

Root Mode Only

After Composition

Keypoint Mode Only

Behavoior Modulation

Before Modulation

After Modulation

Efficient Behavior Acquisition

BFM ONLY

RL from Scratch

BFM + Residual Learning

Real Deployment

Behavior Foundation Modelfor Humanoid Robots

Video

Abstract

Behavior Gallery

Swimming

Walk and Sit on the Ground

Walk and Sit on the Chair

Get up from the Ground

Roundhouse Kick

Basketball Layup

Forward Roll

Butterfly Kick

Cartwheel

Side Salto

Simple Locomotion with Remote Controller

Simple Locomotion with Remote Controller

Squatting Locomotion with Remote Controller

Whole-body Teleoperation with HybrIK

Whole-body Teleoperation with HybrIK

BFM Implementation

BFM Applications

Behavoior Composition

Root Mode Only

After Composition

Keypoint Mode Only

Behavoior Modulation

Before Modulation

After Modulation

Efficient Behavior Acquisition

BFM ONLY

RL from Scratch

BFM + Residual Learning

Real Deployment

Behavior Foundation Model
for Humanoid Robots