Amazing work!!
Hi, thanks for sharing your ForceVLA fork and Hugging Face resources.
I am trying to understand whether your checkpoint tshiamor/forcevla-sfp-sc-data24 can be used as a working ForceVLA inference/checkpoint-loading reference.
I have installed the official ft-robotic/ForceVLA repo and compared it with your fork. Your fork was very useful because it seems to fix the sample_actions() path by using prefix_out_fix instead of prefix_out, and by padding the LIMoE input sequence before calling self.limoe(...).
Could you clarify a few things?
- Which GitHub branch and commit should be used with
tshiamor/forcevla-sfp-sc-data24? - Which config name loads this checkpoint correctly?
- Is this checkpoint a full OpenPI/ForceVLA checkpoint, a LoRA checkpoint, or a train-state checkpoint?
- Does it require
use_joint_state=True? - What exact observation schema does it expect? For example, image keys, state layout, and action layout.
- Is the
prefix_out_fix+ LIMoE padding change required for inference with this checkpoint? - Do you have a minimal command or script for running
sample_actions()on one example observation?
I am trying to keep my local ForceVLA patch minimal and avoid copying dataset-specific changes unless they are required.
Thanks.
I'm glad you find the work helpful.
It was just my attempt at learning ForceVLA and at the same time trying to rush a solution I was learning on the fly for a competition (intrinsic AI for Industry Challenge ). However the qualification event has passed and I missed it, as i did not have a good dataset during the deadline.
This is a work in progress and a bit rough as I'm still fixing several items during spare time. The repo to look out for is develop ( https://github.com/tshiamor/ForceVLA/tree/develop ). The latest dataset i'm working on is : https://huggingface.co/datasets/tshiamor/aic_gt_sfp_all_trimmed_v2 and its checkpoint : https://huggingface.co/tshiamor/forcevla-sfp-all-trimmed-v2 ( wouldn't use yet until verified) also with joint states : https://huggingface.co/datasets/tshiamor/aic_gt_sfp_all_trimmed_v3 (ongoing) . i think these two datasets are a bit more complete and would advice those instead of forcevla-sfp-sc-data24.
I have a created a guide to as an overview and to try to answer some of your questions while putting things together : https://github.com/tshiamor/ForceVLA/blob/develop/GUIDE.md .
Thanks, this is really helpful. I am going through GUIDE.md and will test the v2 checkpoint as an inference reference. I am doing my Master's Thesis in multi-geometry, multi-orientation peg-in-hole insertion on a doosan m1013 robot. I have not started collecting data yet (planning on teleoperation on the real robot), but your work gives me hope that ForceVLA is something worth considering.
Three small clarifications:
For forcevla_sfp_all_trimmed_v2, are the 7D action deltas expressed in the robot base frame, TCP/local end-effector frame, or another task frame?
Is the wrench in observation.state.wrench expressed in the wrist sensor frame, TCP frame, or robot base frame?
Do you have a standalone non-ROS Python script that loads tshiamor/forcevla-sfp-all-trimmed-v2 and runs sample_actions() on one example observation/checkpoint? I want to test checkpoint loading without the AIC ROS stack first.