5CD-AI
/

Vintern-1B-v2

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Community

khang119966 commited on Aug 11, 2024

Commit

4799099

·

verified ·

1 Parent(s): c4cb64d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ tags:
 ## Vintern-1B-v2 ❄️ (Viet-InternVL2-1B-v2) - The LLaVA 🌋 Challenger
-We are excited to introduce  **Vintern-1B-v2** the Vietnamese 🇻🇳 multimodal model that combines the advanced Vietnamese language model [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct)[1] with the latest visual model, [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px)[2], CVPR 2024. This model excels in tasks such as OCR-VQA, Doc-VQA, and Chart-VQA,... With only 1 billion parameters, it is **4096 context length** finetuned from the [Viet-InternVL-1B](https://huggingface.co/datasets/5CD-AI/Viet-InternVL-1B) model on over 3 million specialized image-question-answer pairs for optical character recognition 🔍, text recognition 🔤, document extraction 📑, and general VQA. The model can be integrated into various on-device applications 📱, demonstrating its versatility and robust capabilities.
 [**\[🤗 HF Demo\]**](https://huggingface.co/spaces/khang119966/Vintern-v2-Demo)

 ## Vintern-1B-v2 ❄️ (Viet-InternVL2-1B-v2) - The LLaVA 🌋 Challenger
+We are excited to introduce  **Vintern-1B-v2** the Vietnamese 🇻🇳 multimodal model that combines the advanced Vietnamese language model [Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct)[1] with the latest visual model, [InternViT-300M-448px](https://huggingface.co/OpenGVLab/InternViT-300M-448px)[2], CVPR 2024. This model excels in tasks such as OCR-VQA, Doc-VQA, and Chart-VQA,... With only 1 billion parameters, it is **4096 context length** finetuned from the [Viet-InternVL2-1B](https://huggingface.co/5CD-AI/Viet-InternVL2-1B) model on over 3 million specialized image-question-answer pairs for optical character recognition 🔍, text recognition 🔤, document extraction 📑, and general VQA. The model can be integrated into various on-device applications 📱, demonstrating its versatility and robust capabilities.
 [**\[🤗 HF Demo\]**](https://huggingface.co/spaces/khang119966/Vintern-v2-Demo)