Amazing work!!

#1
by argousaxthechaos - opened

Hi, thanks for sharing your ForceVLA fork and Hugging Face resources.

I am trying to understand whether your checkpoint tshiamor/forcevla-sfp-sc-data24 can be used as a working ForceVLA inference/checkpoint-loading reference.

I have installed the official ft-robotic/ForceVLA repo and compared it with your fork. Your fork was very useful because it seems to fix the sample_actions() path by using prefix_out_fix instead of prefix_out, and by padding the LIMoE input sequence before calling self.limoe(...).

Could you clarify a few things?

  1. Which GitHub branch and commit should be used with tshiamor/forcevla-sfp-sc-data24?
  2. Which config name loads this checkpoint correctly?
  3. Is this checkpoint a full OpenPI/ForceVLA checkpoint, a LoRA checkpoint, or a train-state checkpoint?
  4. Does it require use_joint_state=True?
  5. What exact observation schema does it expect? For example, image keys, state layout, and action layout.
  6. Is the prefix_out_fix + LIMoE padding change required for inference with this checkpoint?
  7. Do you have a minimal command or script for running sample_actions() on one example observation?

I am trying to keep my local ForceVLA patch minimal and avoid copying dataset-specific changes unless they are required.

Thanks.

I'm glad you find the work helpful.
It was just my attempt at learning ForceVLA and at the same time trying to rush a solution I was learning on the fly for a competition (intrinsic AI for Industry Challenge ).  However the qualification event has passed and I missed it, as i did not have a good dataset during the deadline.
This is a work in progress and a bit rough as I'm still fixing several items during spare time. The repo to look out for is develop  ( https://github.com/tshiamor/ForceVLA/tree/develop ). The latest dataset i'm working on is : https://huggingface.co/datasets/tshiamor/aic_gt_sfp_all_trimmed_v2   and  its checkpoint :  https://huggingface.co/tshiamor/forcevla-sfp-all-trimmed-v2 ( wouldn't use yet until verified) also with joint states : https://huggingface.co/datasets/tshiamor/aic_gt_sfp_all_trimmed_v3  (ongoing) . i think these two datasets are a bit more complete and would advice those instead of forcevla-sfp-sc-data24.
I have a created a guide to as an overview and to try to answer some  of your questions while putting things together : https://github.com/tshiamor/ForceVLA/blob/develop/GUIDE.md .

Thanks, this is really helpful. I am going through GUIDE.md and will test the v2 checkpoint as an inference reference. I am doing my Master's Thesis in multi-geometry, multi-orientation peg-in-hole insertion on a doosan m1013 robot. I have not started collecting data yet (planning on teleoperation on the real robot), but your work gives me hope that ForceVLA is something worth considering.

Three small clarifications:

  1. For forcevla_sfp_all_trimmed_v2, are the 7D action deltas expressed in the robot base frame, TCP/local end-effector frame, or another task frame?

  2. Is the wrench in observation.state.wrench expressed in the wrist sensor frame, TCP frame, or robot base frame?

  3. Do you have a standalone non-ROS Python script that loads tshiamor/forcevla-sfp-all-trimmed-v2 and runs sample_actions() on one example observation/checkpoint? I want to test checkpoint loading without the AIC ROS stack first.

Sign up or log in to comment