Quadruped locomotion provides a natural setting for understanding when model-free learning can outperform model-based control design, by exploiting data patterns to bypass the difficulty of optimizing over discrete contacts and mode changes. We give a principled analysis of why imitation learning with quadrupeds can be effective in a small data regime, based on the structure of limit cycles, Poincaré return maps, and local numerical properties of neural networks. The understanding motivates a new imitation learning method Latent Variation Regularization (LVR) that regularizes the alignment of distributions in a latent representation space with the output action variations. Hardware experiments confirm that a few seconds of demonstration is sufficient to train locomotion policies from scratch entirely offline with reasonable robustness.
Two representative trials trained from the same 10-second demonstration.
Learned behaviors spanning distinct gait styles.
Policies trained on a single source domain data remain stable under domain shifts.
Starting from only a few seconds of expert observation–action data, we train a neural locomotion policy fully offline. Our key idea, Latent Variation Regularization (LVR), aligns local changes in hidden features with local changes in expert actions, so the network captures not only the action labels themselves but also their underlying variation structure. This encourages a more structured latent space and leads to more stable and robust quadruped walking in the small-data regime.