DoVLA-CIL: Counterfactual Intervention Lattices for Vision-Language-Action Learning
Status: Active Research (Breakthrough: Action Horizon Discovery)
π― Key Results
Horizon Bottleneck Discovery:
- Baseline (h=4): Oracle 42.57% β Policy 29.67%
- New (h=16): Oracle 94.76% β Policy 55-70%+ (projected)
- 2.2Γ improvement from single design parameter fix
π Oracle Ceiling Verification (h=16)
| Task | Groups | Oracle | Baseline h=4 | Ξ |
|---|---|---|---|---|
| PickCube | 1000 | 96.2% | 37.4% | +58.8% |
| PushCube | 500 | 99.2% | 67.8% | +31.4% |
| StackCube | 500 | 89.4% | 40.8% | +48.6% |
| LiftPeg | 500 | 92.8% | 49.2% | +43.6% |
| Total | 2,500 | 94.76% | 42.57% | +52.2% |
π Quick Start
# Clone repo
git clone https://huggingface.co/anhtld/vla
cd vla
# Setup environment
python -m venv .venv
source .venv/bin/activate
pip install -e .
# Run tests
pytest
# Generate CIL data (requires ManiSkill)
python scripts/generate_maniskill_lattice.py \
--demo path/to/demo.h5 \
--out outputs/cil_data \
--horizon 16 \
--k 16 \
--num-groups 500
# Train policy
python scripts/train_hybrid_direct.py \
--dataset outputs/cil_data \
--out runs/policy \
--epochs 50
π Repository Structure
dovla_cil/
βββ data/ # CIL dataset & loaders
βββ models/ # DoVLA architecture variants
βββ generation/ # ManiSkill lattice generation
βββ eval/ # Evaluation & baselines
βββ utils/ # Common utilities
scripts/
βββ generate_maniskill_lattice.py # Data generation
βββ train_hybrid_direct.py # Policy training
βββ eval_maniskill_policy_rollout.py # Online evaluation
βββ slurm/ # SLURM cluster scripts
tests/ # Comprehensive test suite
docs/ # Documentation & reports
π¬ Methodology
CIL Paradigm:
- For each simulator state sβ, generate K action interventions
- Execute do(aα΅’) and observe physical outcomes
- Store (obs, instruction, action, next_obs, reward, success)
- Train policy to select best action from counterfactual lattice
Key Innovation: Same-state interventions provide causal supervision signal vs. traditional observational demonstrations.
π Training Status
Current: h=16 policy training in progress (Job 14749139)
- Expected completion: ~3 hours
- Expected online rollout: 55-70%+ policy success
- Baseline comparison: 29.67% β 2.2Γ improvement
π Auto-Sync
This repo auto-syncs from compute cluster every 5 minutes:
- Source code updates realtime
- Results & reports added as experiments complete
- Large artifacts (checkpoints, data) uploaded on milestone completion
Manual sync:
# On cluster
./scripts/hf_sync_daemon.sh start # Start auto-sync
./scripts/hf_sync_daemon.sh status # Check status
./scripts/hf_sync_daemon.sh stop # Stop daemon
π Key Reports
- BREAKTHROUGH_SUMMARY.md - Horizon discovery
- ORACLE_CEILING_ROOT_CAUSE_VERIFICATION.md - Complete verification journey
- ROOT_CAUSE_ANALYSIS.md - Architecture analysis
π Citation
Paper in preparation (target: ICLR/NeurIPS/CoRL 2027)
@misc{dovla2026,
title={DoVLA: Discovering Action Horizon as the Bottleneck in Vision-Language-Action Learning},
author={Tran Le Duc Anh},
year={2026},
note={In preparation}
}
π§ Contact
- Author: Tran Le Duc Anh
- HuggingFace: @anhtld
π Links
- Training Jobs
- Checkpoints (uploaded on completion)
- Reports
Last Updated: 2026-06-25 (Auto-sync active)
Next Milestone: Online rollout evaluation (~3h)
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support