4 Out-Of-Distribution (OOD) Generalization

This section evaluates the primary Test-Time Adaptation claims. We compare the base Flow-Matched model against a Domain Randomization (DR) baseline across both Nominal dynamics and severe Out-Of-Distribution (OOD) physics (e.g., payload drops and friction spikes).

4.1 Survival Rates & Agility

Standard Domain Randomization often trades off nominal agility for OOD generalization by forcing the robot into a slow, conservative gait. Reversible Flow Adaptation evaluates whether high agility is maintained in Nominal environments while surviving OOD environments via the online 2Hz reflex loop.

4.2 Reward Degradation

A more granular metric than binary survival is the total accumulated reward, which penalizes slow gaits and high energy usage.

4.2.1 Statistical Analysis & Interpretation

Interpretation: The survival rates demonstrate that Reversible Flow Adaptation maintains high completion rates in nominal conditions while mitigating catastrophic failures in OOD environments better than Domain Randomization. The reward distribution analysis provides a continuous measure of this performance gap.

The Welch two-sample t-tests quantify the statistical significance of the reward distributions. A statistically significant advantage (\(p < 0.05\)) in nominal reward for Reversible Flow confirms that the baseline Domain Randomization policy suffers from conservative, suboptimal gaits even when the environment is normal. Concurrently, the significance in OOD reward confirms that the Flow Matching test-time adaptation successfully bridges the sim-to-real dynamic gap when mass and friction shift unexpectedly, outperforming static generalization techniques.