dynamic-vla-DOM / README.md

hzxie

docs: update README

c9609d9 verified 9 days ago

preview code

raw

history blame contribute delete

1.42 kB

metadata

license: other
license_name: slab-license
license_link: LICENSE
datasets:
  - hzxie/DOM
base_model:
  - HuggingFaceTB/SmolLM2-360M
pipeline_tag: robotics
tags:
  - robotics
  - lerobot
  - dynamicvla

Model Card for DynamicVLA

DynamicVLA is a vision-language-action model for dynamic object manipulation. It is designed to handle dynamic scenes that require fast perception, temporal anticipation, and continuous control.

This model is trained and evaluated using the official DynamicVLA codebase. For full setup, training, and benchmarking instructions, please refer to the repository README.

How to Get Started with the Model

For a complete walkthrough, see the official DynamicVLA repository. Below is the short version for training and running inference/evaluation.

Train from scratch

From the PROJECT_ROOT/dynamic-vla directory, run:

torchrun --nnodes=1 --nproc_per_node=8 --standalone run.py \
  -c configs/dynamicvla.yaml \
  -d hzxie/DOM

Evaluate the policy / run inference

# 1. start evaluation server
python3 simulations/evaluate.py \
  --scene_dir ../scenes \
  --output_dir ../output/evaluation \
  --env_cfg ../test-envs.txt \
  --enable_cameras --headless -n 20 --save

# 2. run policy inference
python3 scripts/inference.py \
  -p /path/to/vla-checkpoint \
  -r euler -d -s