khang119966
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ tags:
|
|
30 |
|
31 |
## Vintern-1B-v2 βοΈ (Viet-InternVL2-1B-v2) - The LLaVA π Challenger
|
32 |
|
33 |
-
We are excited to introduce **Vintern-1B-v2** the Vietnamese π»π³ multimodal model that combines the advanced Vietnamese language model [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct)[1] with the latest visual model, [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px)[2], CVPR 2024. This model excels in tasks such as OCR-VQA, Doc-VQA, and Chart-VQA,... With only 1 billion parameters, it is **4096 context length** finetuned from the [Viet-
|
34 |
|
35 |
[**\[π€ HF Demo\]**](https://huggingface.co/spaces/khang119966/Vintern-v2-Demo)
|
36 |
|
|
|
30 |
|
31 |
## Vintern-1B-v2 βοΈ (Viet-InternVL2-1B-v2) - The LLaVA π Challenger
|
32 |
|
33 |
+
We are excited to introduce **Vintern-1B-v2** the Vietnamese π»π³ multimodal model that combines the advanced Vietnamese language model [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct)[1] with the latest visual model, [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px)[2], CVPR 2024. This model excels in tasks such as OCR-VQA, Doc-VQA, and Chart-VQA,... With only 1 billion parameters, it is **4096 context length** finetuned from the [Viet-InternVL2-1B](https://huggingface.co/5CD-AI/Viet-InternVL2-1B) model on over 3 million specialized image-question-answer pairs for optical character recognition π, text recognition π€, document extraction π, and general VQA. The model can be integrated into various on-device applications π±, demonstrating its versatility and robust capabilities.
|
34 |
|
35 |
[**\[π€ HF Demo\]**](https://huggingface.co/spaces/khang119966/Vintern-v2-Demo)
|
36 |
|