tracevla_7b / README.md

Update README.md

23ac599 verified about 1 month ago

4.11 kB

	---
	license: mit
	license_link: https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/resolve/main/LICENSE

	language:
	- multilingual
	pipeline_tag: text-generation
	tags:
	- nlp
	- code
	- vision
	widget:
	- messages:
	- role: user
	content: <\|image_1\|>\nWhat action should the robot take to {lang}?
	---

	## TraceVLA-7B
	``TraceVLA-7B`` model is a vision-language-action model obtained by finetuning the base [OpenVLA](https://huggingface.co/openvla/openvla-7b) model with [visual trace prompting technique](https://arxiv.org/abs/2412.10345).

	### Results on SimplerEnv Fractal + SimplerEnv:

	#### Fractal:
	\| Policy/Settings \| Pick up Coke \| Move near \| Open/Close Drawer \| Put in Drawer \| Average Success Rate \|
	\|:------:\|:------------:\|:---------:\|:------------:\|:-----------:\|:-------:\|
	\| (Visual Matching) OpenVLA-7B \| 23.7% \| 65.0% \| 57.4% \| 0.% \| 36.5% \|
	\| (Visual Matching) TraceVLA-7B \| 45.0% \| 63.8% \| 63.1% \| 11.1.% \| 45.8% \|
	\| (Variant Aggregation) OpenVLA-7B \| 61.3% \| 55.8% \| 24.9% \| 1.0% \| 35.8% \|
	\| (Variant Aggregation) TraceVLA-7B \| 64.3% \| 60.6% \| 61.6% \| 12.5.% \| 49.8% \|

	#### Bridge:
	\| Policy/Settings \| Put Spoon \| Put Carrot \| Stack Block \| Put Eggplant \| Average Success Rate \|
	\|:------:\|:------------:\|:---------:\|:------------:\|:-----------:\|:-------:\|
	\| OpenVLA-7B \| 8.3% \| 8.3% \| 4.2% \| 45.8% \| 16.7% \|
	\| TraceVLA-7B \| 12.5% \| 16.6% \| 16.6% \| 65.0% \| 27.7% \|


	### Sample Inference Code
	Here is the sample inference code of TraceVLA-7B model.
	```
	model_path = "furonghuang-lab/tracevla_7b"
	# Load Processor & VLA
	processor = AutoProcessor.from_pretrained(
	model_path,
	trust_remote_code=True,
	num_crops=1,
	)

	vla = AutoModelForCausalLM.from_pretrained(
	model_path,
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	_attn_implementation='flash_attention_2',
	use_cache=True
	).to(device='cuda')

	# Load Visual Trace Processor
	# cotracker_model_path corresponds to the path to your downloaded scaled_offline.pth checkpoint
	from prismatic.eval.trace_processor import TraceProcessor
	trace_processor = TraceProcessor(cotracker_model_path)

	# Grab image input & format prompt
	# In case where the visual trace returned by Co-Tracker is not valid, we use the default openvla prompt.
	openvla_prompt_template = "In: What action should the robot take to {task_description}?\nOut:"
	tracevla_prompt_template = "In: You are given two images: one with the original robot observation, and another one marked with historical traces of the robot end effector and moving objects, separated by a special separator token. What action should the robot take to {task_description}?\nOut:"

	image: Image.Image = get_from_camera(...)
	image_overlaid, has_trace = trace_processors.process_image(image)

	if not has_trace:
	prompt = openvla_prompt_template.format(task_description=task_description)
	inputs = processor(prompt, [image, image]).to(device='cuda', dtype=torch.bfloat16)
	else:
	prompt = tracevla_prompt_template.format(task_description=task_description)
	inputs = processor(prompt, [image, image_overlaid]).to(device='cuda', dtype=torch.bfloat16)

	### Predict the action
	with torch.inference_mode():
	action = vla.predict_action(**inputs)

	# Execute the action
	robot.act(action, ...)
	```

	For more examples, including scripts for finetuning TraceVLA models on your own robot demonstration datasets, check out our [repository](https://github.com/FrankZheng2022/tracevla).




	### Citation

	If you find our code or models useful in your work, please cite [our paper](https://arxiv.org/abs/2412.10345):

	```bibtex
	@misc{zheng2024tracevlavisualtraceprompting,
	title={TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies},
	author={Ruijie Zheng and Yongyuan Liang and Shuaiyi Huang and Jianfeng Gao and Hal Daumé III and Andrey Kolobov and Furong Huang and Jianwei Yang},
	year={2024},
	eprint={2412.10345},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2412.10345},
	}
	```