Vintern-1B-v2-ViTable-docvqa
Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)
Benchmarks
Model | ANLS | Semantic Similarity | MLLM-as-judge (Gemini) |
---|---|---|---|
Gemini 1.5 Flash | 0.35 | 0.56 | 0.40 |
Vintern-1B-v2 | 0.04 | 0.45 | 0.50 |
Vintern-1B-v2-ViTable-docvqa | 0.50 | 0.71 | 0.59 |
Usage
Check out this 🤗 HF Demo, or you can open it in Colab:
Citation:
@misc{doan2024vintern1befficientmultimodallarge,
title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese},
author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
year={2024},
eprint={2408.12480},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2408.12480},
}
- Downloads last month
- 41
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support model that require custom code execution.