Daniel Bourke's picture

Daniel Bourke PRO

mrdbourke

·

https://www.mrdbourke.com

AI & ML interests

Computer vision. Small on-device models. VLMs. High-quality tutorials.

Recent Activity

upvoted a collection about 8 hours ago

upvoted a collection about 8 hours ago

updated a Space about 9 hours ago

mrdbourke/foodvision_big_video

View all activity

Organizations

None yet

mrdbourke's activity

upvoted 2 collections about 8 hours ago

Web-SSL

17 items • Updated about 17 hours ago • 3

DataDecide

A suite of models, data, and evals over 25 corpora, 14 sizes, and 3 seeds to measure how accurately small experiments predict rankings at large scale. • 358 items • Updated 8 days ago • 13

upvoted an article about 9 hours ago

Article

Cohere on Hugging Face Inference Providers 🔥

9 days ago

• 113

upvoted a collection about 14 hours ago

Describe Anything

Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 6 items • Updated 1 day ago • 30

upvoted a collection 6 days ago

Perception Encoder

9 items • Updated 7 days ago • 23

upvoted a collection 15 days ago

Kimi-VL-A3B

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 12 days ago • 61

upvoted a collection 17 days ago

Gemma 3 QAT

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 6 days ago • 161

upvoted an article 19 days ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

20 days ago

• 140

upvoted a paper 20 days ago

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Paper • 2504.00595 • Published 23 days ago • 35

upvoted a collection 23 days ago

ShieldGemma

ShieldGemma is a family of models for text and image content moderation. • 4 items • Updated 21 days ago • 6

upvoted an article 23 days ago

Article

Training and Finetuning Reranker Models with Sentence Transformers v4

30 days ago

• 119

upvoted a paper 23 days ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 97

upvoted a collection 24 days ago

Vision Language Models Quantization

Vision Language Models (VLMs) quantized by Neural Magic • 20 items • Updated Mar 4 • 6

upvoted an article 28 days ago

Article

Welcome to Inference Providers on the Hub 🔥

Jan 28

• 479

upvoted a collection about 1 month ago

LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 11 items • Updated 7 days ago • 60

upvoted an article about 1 month ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12

• 399

upvoted a collection about 2 months ago

olmOCR

olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 4 items • Updated Mar 19 • 106

upvoted an article about 2 months ago

Article

FastRTC: The Real-Time Communication Library for Python

Feb 25

• 158

upvoted 2 collections about 2 months ago

Granite Vision Models

3 items • Updated 8 days ago • 13

SmolVLM2 📺 Smallest video LM ever 🤏🏻

11 items • Updated Feb 25 • 82