Learning Quadruped Walking from
Seconds of Demonstration

via
Latent Variation Regularization
Ruipeng Zhang, Hongzhan Yu, Ya-Chien Chang, Chenghao Li, Henrik I. Christensen, Sicun Gao
University of California, San Diego
Paper arXiv Code
Overview illustration

Abstract

Quadruped locomotion provides a natural setting for understanding when model-free learning can outperform model-based control design, by exploiting data patterns to bypass the difficulty of optimizing over discrete contacts and mode changes. We give a principled analysis of why imitation learning with quadrupeds can be effective in a small data regime, based on the structure of limit cycles, Poincaré return maps, and local numerical properties of neural networks. The understanding motivates a new imitation learning method Latent Variation Regularization (LVR) that regularizes the alignment of distributions in a latent representation space with the output action variations. Hardware experiments confirm that a few seconds of demonstration is sufficient to train locomotion policies from scratch entirely offline with reasonable robustness.

Pipeline

Performance

Train on the same training dataset (10s)

Two representative trials trained from the same 10-second demonstration.

Trial 1
BC / LVR
Behavior Cloning
LVR
Trial 2
BC / LVR
Behavior Cloning
LVR

Various gaits

Learned behaviors spanning distinct gait styles.

Gait A
Gait B

Training data from flat ground; robust to diverse domains

Policies trained on a single source domain data remain stable under domain shifts.

Domain Shift 1
Different terrain and condition
Domain Shift 2
Different terrain and condition

Methodology

Method overview pipeline
Overview of LVR pipeline.

Starting from only a few seconds of expert observation–action data, we train a neural locomotion policy fully offline. Our key idea, Latent Variation Regularization (LVR), aligns local changes in hidden features with local changes in expert actions, so the network captures not only the action labels themselves but also their underlying variation structure. This encourages a more structured latent space and leads to more stable and robust quadruped walking in the small-data regime.