whyisverysmart
/

Fourier-LLaVA-v1.5-7B-64

vision-language-model

token-compression

frequency-domain

Model card Files Files and versions

Fourier-LLaVA-v1.5-7B-64

Official checkpoints for Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models.

Model Details

Model	Base Model	Visual Tokens	Compression	Weights
Fourier-LLaVA-v1.5-7B-256	LLaVA-v1.5-7B	256	55.6%	🤗 HF
Fourier-LLaVA-v1.5-7B-144	LLaVA-v1.5-7B	144	75.0%	🤗 HF
Fourier-LLaVA-v1.5-7B-64	LLaVA-v1.5-7B	64	88.9%	🤗 HF
Fourier-LLaVA-v1.5-7B-36	LLaVA-v1.5-7B	36	93.8%	🤗 HF
Fourier-LLaVA-v1.5-13B-144	LLaVA-v1.5-13B	144	75.0%	🤗 HF
Fourier-Qwen2-VL-2B-0.67	Qwen2-VL-2B-Instruct	Dynamic	55.6%	🤗 HF
Fourier-Qwen2.5-VL-3B-0.67	Qwen2.5-VL-3B-Instruct	Dynamic	55.6%	🤗 HF

Links

Downloads last month: 16

Safetensors

Model size

7B params

Tensor type

F16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for whyisverysmart/Fourier-LLaVA-v1.5-7B-64

Base model

liuhaotian/llava-v1.5-7b

Finetuned

(32)

this model

Collection including whyisverysmart/Fourier-LLaVA-v1.5-7B-64

Fourier Compressor

Official checkpoints for "Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models". • 8 items • Updated 18 days ago

Paper for whyisverysmart/Fourier-LLaVA-v1.5-7B-64

Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models

Paper • 2508.06038 • Published Aug 8, 2025