Qwen3.5-4B Text Backbone / Qwen3.5-4B 文本骨干网络

The text backbone (language_model) extracted from Qwen/Qwen3.5-4B VLM, saved as a standalone model for use as a text encoder.

从 Qwen/Qwen3.5-4B 视觉语言模型中提取的文本骨干网络（language_model），保存为独立模型用作文本编码器。

Why? / 为什么？

Qwen3.5 models are vision-language models (VLMs) that include both vision and text components. For text-only use cases (e.g., text encoding for motion generation), loading the full VLM wastes bandwidth, disk space, and memory on unused vision components. This repo provides just the text backbone.

Qwen3.5 模型是包含视觉和文本组件的视觉语言模型（VLM）。对于纯文本场景（例如用于运动生成的文本编码），加载完整 VLM 会浪费带宽、磁盘空间和内存。本仓库仅提供文本骨干网络。

Model Details / 模型详情

Source: Qwen/Qwen3.5-4B
Extracted component: model.language_model (Qwen3_5TextModel)
Hidden size: 2560
Layers: 32 (hybrid: 24 linear attention + 8 full attention)
Dtype: bfloat16
Config model_type: qwen3_5_text
Requires: transformers>=5.2.0

Usage / 使用方法

Direct loading / 直接加载

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("Qian2501/Qwen3.5-4B-text", dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained("Qian2501/Qwen3.5-4B-text")

With Kimodo / 配合 Kimodo 使用

TEXT_ENCODER=qwen3.5-4b TEXT_ENCODER_MODE=local TEXT_ENCODER_DEVICE=cpu kimodo_gen "A person walks forward" --bvh

Note: A trained projection layer (2560 -> 4096) is required for use with Kimodo, since this model's hidden size (2560) does not match the denoiser's expected dimension (4096). See the Kimodo README for details on training projection layers.

注意：配合 Kimodo 使用时必须训练投影层（2560 -> 4096），因为该模型的 hidden size（2560）与 denoiser 期望的维度（4096）不匹配。投影层训练方法请参阅 Kimodo README。

Extraction / 提取方法

This model was extracted using:

from transformers import AutoModel, AutoTokenizer

full_model = AutoModel.from_pretrained("Qwen/Qwen3.5-4B", dtype="bfloat16")
text_model = full_model.language_model
text_model.save_pretrained("Qwen3.5-4B-text", max_shard_size="4GB")
AutoTokenizer.from_pretrained("Qwen/Qwen3.5-4B").save_pretrained("Qwen3.5-4B-text")

License / 许可证

Apache-2.0 (same as the original Qwen3.5-4B)

Related / 相关链接

Qwen/Qwen3.5-4B - Original VLM
Qian2501/Qwen3.5-9B-text - 9B text backbone
Kimodo - Kinematic Motion Diffusion Model
Qian2501/kimodo-qwen3-projection - Qwen3 projection layer

Downloads last month: 126

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Qian2501/Qwen3.5-4B-text

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(342)

this model