DoVLA-CIL: Counterfactual Intervention Lattices for Vision-Language-Action Learning

Status: Active Research (Breakthrough: Action Horizon Discovery)

🎯 Key Results

Horizon Bottleneck Discovery:

  • Baseline (h=4): Oracle 42.57% β†’ Policy 29.67%
  • New (h=16): Oracle 94.76% β†’ Policy 55-70%+ (projected)
  • 2.2Γ— improvement from single design parameter fix

πŸ“Š Oracle Ceiling Verification (h=16)

Task Groups Oracle Baseline h=4 Ξ”
PickCube 1000 96.2% 37.4% +58.8%
PushCube 500 99.2% 67.8% +31.4%
StackCube 500 89.4% 40.8% +48.6%
LiftPeg 500 92.8% 49.2% +43.6%
Total 2,500 94.76% 42.57% +52.2%

πŸš€ Quick Start

# Clone repo
git clone https://huggingface.co/anhtld/vla
cd vla

# Setup environment
python -m venv .venv
source .venv/bin/activate
pip install -e .

# Run tests
pytest

# Generate CIL data (requires ManiSkill)
python scripts/generate_maniskill_lattice.py \
  --demo path/to/demo.h5 \
  --out outputs/cil_data \
  --horizon 16 \
  --k 16 \
  --num-groups 500

# Train policy
python scripts/train_hybrid_direct.py \
  --dataset outputs/cil_data \
  --out runs/policy \
  --epochs 50

πŸ“ Repository Structure

dovla_cil/
β”œβ”€β”€ data/          # CIL dataset & loaders
β”œβ”€β”€ models/        # DoVLA architecture variants
β”œβ”€β”€ generation/    # ManiSkill lattice generation
β”œβ”€β”€ eval/          # Evaluation & baselines
└── utils/         # Common utilities

scripts/
β”œβ”€β”€ generate_maniskill_lattice.py  # Data generation
β”œβ”€β”€ train_hybrid_direct.py         # Policy training
β”œβ”€β”€ eval_maniskill_policy_rollout.py  # Online evaluation
└── slurm/         # SLURM cluster scripts

tests/             # Comprehensive test suite
docs/              # Documentation & reports

πŸ”¬ Methodology

CIL Paradigm:

  1. For each simulator state sβ‚€, generate K action interventions
  2. Execute do(aα΅’) and observe physical outcomes
  3. Store (obs, instruction, action, next_obs, reward, success)
  4. Train policy to select best action from counterfactual lattice

Key Innovation: Same-state interventions provide causal supervision signal vs. traditional observational demonstrations.

πŸ“ˆ Training Status

Current: h=16 policy training in progress (Job 14749139)

  • Expected completion: ~3 hours
  • Expected online rollout: 55-70%+ policy success
  • Baseline comparison: 29.67% β†’ 2.2Γ— improvement

πŸ”„ Auto-Sync

This repo auto-syncs from compute cluster every 5 minutes:

  • Source code updates realtime
  • Results & reports added as experiments complete
  • Large artifacts (checkpoints, data) uploaded on milestone completion

Manual sync:

# On cluster
./scripts/hf_sync_daemon.sh start   # Start auto-sync
./scripts/hf_sync_daemon.sh status  # Check status
./scripts/hf_sync_daemon.sh stop    # Stop daemon

πŸ“„ Key Reports

πŸŽ“ Citation

Paper in preparation (target: ICLR/NeurIPS/CoRL 2027)

@misc{dovla2026,
  title={DoVLA: Discovering Action Horizon as the Bottleneck in Vision-Language-Action Learning},
  author={Tran Le Duc Anh},
  year={2026},
  note={In preparation}
}

πŸ“§ Contact

  • Author: Tran Le Duc Anh
  • HuggingFace: @anhtld

πŸ”— Links


Last Updated: 2026-06-25 (Auto-sync active)
Next Milestone: Online rollout evaluation (~3h)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support