Vintern-1B-v2-ViTable-docvqa

Report Link👁️

Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)

Benchmarks

Model ANLS Semantic Similarity MLLM-as-judge (Gemini)
Gemini 1.5 Flash 0.35 0.56 0.40
Vintern-1B-v2 0.04 0.45 0.50
Vintern-1B-v2-ViTable-docvqa 0.50 0.71 0.59

Usage

Check out this 🤗 HF Demo, or you can open it in Colab:
Open In Colab

Citation:

@misc{doan2024vintern1befficientmultimodallarge,
      title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese}, 
      author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
      year={2024},
      eprint={2408.12480},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.12480}, 
}
Downloads last month
41
Safetensors
Model size
938M params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Model tree for YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Finetuned
(2)
this model

Dataset used to train YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Space using YuukiAsuna/Vintern-1B-v2-ViTable-docvqa 1