Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings
Paper • 2604.22280 • Published • 1
RIME (Rewrite-drIven Multimodal Embedding) model based on Qwen2-VL-2B-Instruct.
RIME jointly optimizes generation and embedding through a retrieval-friendly rewrite paradigm, producing both discriminative and generative multimodal embeddings for text, images, videos, and visual documents.
See the RIME repository for inference and evaluation examples.
@article{wu2026beyond,
title={Beyond Chain-of-Thought: Rewrite as a Universal Interface for Generative Multimodal Embeddings},
author={Wu, Peixi and Mei, Ke and Ma, Feipeng and Chai, Bosong and Lan, Zhibin and Zhao, Chenxi and Yan, Shannan and Chen, Jie and Hu, Zhangchi and Peng, Yansong and others},
journal={arXiv preprint arXiv:2604.22280},
year={2026}
}