Nguyen Quang Truong

nqtruong

AI & ML interests

None yet

Recent Activity

updated a dataset about 2 months ago
nqtruong/Vietnamese_folk_music
View all activity

Organizations

None yet

nqtruong's activity

New activity in haihuynh/musicgen-finetune 2 months ago

Upload state_dict.bin

#1 opened 2 months ago by nqtruong
updated a model 4 months ago
updated a Space 4 months ago
New activity in haihuynh/face-img-retrieval 8 months ago

Upload img_align_celeba.zip

#1 opened 8 months ago by nqtruong
Reacted to merve's post with 🚀 8 months ago
view post
Post
3324
LLaVA-NeXT is recently merged to Hugging Face transformers and it outperforms many of the closed source models like Gemini on various benchmarks 🤩 Let's take a look!
Demo: merve/llava-next
Notebook: https://colab.research.google.com/drive/1afNudu72SNWZCYtCVrRlb9T9Vj9CFJEK?usp=sharing
LLaVA is essentially a vision-language model that consists of ViT-based CLIP encoder, a MLP projection and Vicuna as decoder ✨
LLaVA 1.5 was released with Vicuna, but LLaVA NeXT (1.6) is released with four different LLMs:
- Nous-Hermes-Yi-34B
- Mistral-7B
- Vicuna 7B & 13B
Mistral and Nous-Hermes-Yi-34B are performing better and have better commercial use.
Moreover, according to authors' findings, the improvements comes from more diverse and high quality data mixture and dynamic high resolution.
LLaVA based on Nous-Hermes-Yi-34B outperforms many other models, including Gemini in various multimodal understanding and generation benchmarks 😊