Model Summary
The TVC models are 7B parameter models based on Qwen2-VL-7B-Instruct model with a context window of 8K tokens.
- Repository: https://github.com/sun-hailong/TVC
- Languages: English, Chinese
- Paper: https://arxiv.org/abs/2503.13360
Model Architecture
- Architecture: Qwen2-VL-7B-Instruct
- Data: a mixture of 300k long-chain reasoning data
- Precision: BFloat16
Hardware & Software
- Hardware: 64 * NVIDIA Tesla H20
- Orchestration: HuggingFace Trainer
- Code: Pytorch
Framework versions
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
Citation
@article{sun2024mitigating,
title={Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning},
author={Sun, Hai-Long and Sun, Zhun and Peng, Houwen and Ye, Han-Jia},
journal={arXiv preprint arXiv:2503.13360},
year={2025}
}
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.