Merve Noyan PRO

merve

AI & ML interests

VLMs, vision & co

Articles

Organizations

Posts 46

view post
Post
351
At Hugging Face we have an open-source Cookbook with many applied AI recipes πŸ“–
Here are some of the latest recipes contributed β₯₯

- "Information Extraction with Haystack and NuExtract": Use Haystack and transformers to build structured data extraction pipelines using LLMs by @anakin87 https://huggingface.co/learn/cookbook/en/information_extraction_haystack_nuextract

- "Build RAG with Hugging Face and Milvus": Learn how to use Milvus with sentence transformers to build RAG pipelines https://huggingface.co/learn/cookbook/rag_with_hf_and_milvus

- "Code Search with Vector Embeddings and Qdrant": Search a codebase by building a retrieval pipeline using Qdrant and sentence transformers https://huggingface.co/learn/cookbook/code_search

- Data analyst agent: get your data’s insights in the blink of an eye ✨: great recipe by our own @m-ric showing how to build an agent that can do data analysis! 😱 https://huggingface.co/learn/cookbook/agent_data_analyst
view post
Post
1321
We have recently merged Video-LLaVA to transformers! πŸ€—πŸŽžοΈ
What makes this model different?

Demo: llava-hf/video-llava
Model: LanguageBind/Video-LLaVA-7B-hf

Compared to other models that take image and video input and either project them separately or downsampling video and projecting selected frames, Video-LLaVA is converting images and videos to unified representation and project them using a shared projection layer.

It uses Vicuna 1.5 as the language model and LanguageBind's own encoders that's based on OpenCLIP, these encoders project the modalities to an unified representation before passing to projection layer.


I feel like one of the coolest features of this model is the joint understanding which is also introduced recently with many models

It's a relatively older model but ahead of it's time and works very well! Which means, e.g. you can pass model an image of a cat and a video of a cat and ask questions like whether the cat in the image exists in video or not 🀩