Learning Quadruped Walking from Seconds of Demonstration

Abstract

Quadruped locomotion provides a natural setting for understanding when model-free learning can outperform model-based control design, by exploiting data patterns to bypass the difficulty of optimizing over discrete contacts and mode changes. We give a principled analysis of why imitation learning with quadrupeds can be effective in a small data regime, based on the structure of limit cycles, Poincaré return maps, and local numerical properties of neural networks. The understanding motivates a new imitation learning method Latent Variation Regularization (LVR) that regularizes the alignment of distributions in a latent representation space with the output action variations. Hardware experiments confirm that a few seconds of demonstration is sufficient to train locomotion policies from scratch entirely offline with reasonable robustness.

Performance

Train on the same training dataset (10s)

Two representative trials trained from the same 10-second demonstration.

Trial 1

BC / LVR

Behavior Cloning

LVR

Trial 2

BC / LVR

Behavior Cloning

LVR

Various gaits

Learned behaviors spanning distinct gait styles.

Gait A

Gait B

Training data from flat ground; robust to diverse domains

Policies trained on a single source domain data remain stable under domain shifts.

Domain Shift 1

Different terrain and condition

Domain Shift 2