GUAN-TING KE's picture

9 4

GUAN-TING KE

RFTFT

·

AI & ML interests

NLP

Recent Activity

reacted to merve's post with 👍 4 days ago

Forget any document retrievers, use ColPali 💥💥 Document retrieval is done through OCR + layout detection, but you are losing a lot of information in between, stop doing that! 🤓 ColPali uses a vision language model, which is better in doc understanding 📑 ColPali: https://huggingface.co/vidore/colpali (mit license!) Blog post: https://huggingface.co/blog/manu/colpali The authors also released a new benchmark for document retrieval: ViDoRe Benchmark: https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d ViDoRe Leaderboard: https://huggingface.co/spaces/vidore/vidore-leaderboard ColPali marries the idea of modern vision language models with retrieval 🤝 The authors apply contrastive fine-tuning to SigLIP on documents, and pool the outputs (they call it BiSigLip). Then they feed the patch embedding outputs to PaliGemma and create BiPali 🖇️ BiPali natively supports image patch embeddings to an LLM, which enables leveraging the ColBERT-like late interaction computations between text tokens and image patches (hence the name ColPali!) 🤩 The authors created the ViDoRe benchmark by collecting PDF documents and generate queries from Claude-3 Sonnet. ColPali seems to be the most performant model on ViDoRe. Not only this, but is way faster than traditional PDF parsers too!

upvoted an article 8 days ago

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

reacted to burtenshaw's post with 🔥 3 months ago

We’re launching a FREE and CERTIFIED course on Agents! We're thrilled to announce the launch of the Hugging Face Agents course on Learn! This interactive, certified course will guide you through building and deploying your own AI agents. Here's what you'll learn: - Understanding Agents: We'll break down the fundamentals of AI agents, showing you how they use LLMs to perceive their environment (observations), reason about it (thoughts), and take actions. Think of a smart assistant that can book appointments, answer emails, or even write code based on your instructions. - Building with Frameworks: You'll dive into popular agent frameworks like LangChain, LlamaIndex and smolagents. These tools provide the building blocks for creating complex agent behaviors. - Real-World Applications: See how agents are used in practice, from automating SQL queries to generating code and summarizing complex documents. - Certification: Earn a certification by completing the course modules, implementing a use case, and passing a benchmark assessment. This proves your skills in building and deploying AI agents. Audience This course is designed for anyone interested in the future of AI. Whether you're a developer, data scientist, or simply curious about AI, this course will equip you with the knowledge and skills to build your own intelligent agents. Enroll today and start building the next generation of AI agent applications! https://bit.ly/hf-learn-agents

View all activity

Organizations

None yet

RFTFT's activity

upvoted an article 8 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

26 days ago

• 372

upvoted a paper 5 months ago

LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

Paper • 2411.04997 • Published Nov 7, 2024 • 39

upvoted a paper 7 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 79

upvoted a collection 7 months ago

Multimodal RAG

10 items • Updated Sep 5, 2024 • 27

upvoted 4 papers about 1 year ago

ReNoise: Real Image Inversion Through Iterative Noising

Paper • 2403.14602 • Published Mar 21, 2024 • 21

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21, 2024 • 35

FlashTex: Fast Relightable Mesh Texturing with LightControlNet

Paper • 2402.13251 • Published Feb 20, 2024 • 15

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20, 2024 • 48

upvoted a collection over 1 year ago

ICCV 2023 Demos

Demos for ICCV 2023 papers • 38 items • Updated Oct 5, 2023 • 9