Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.14394

Papers - SAM - Segment Anything Model

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

Paper • 2309.01674 • Published Sep 4, 2023 • 2
Segment Anything

Paper • 2304.02643 • Published Apr 5, 2023 • 2
EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Paper • 2403.18118 • Published Mar 26 • 9
A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published Apr 22 • 20

Papers - Image - Clip

Demystifying CLIP Data

Paper • 2309.16671 • Published Sep 28, 2023 • 19
Model Stock: All we need is just a few fine-tuned models

Paper • 2403.19522 • Published Mar 28 • 10
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1 • 19
On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3 • 17

Papers - Image - Dino

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology

Paper • 2203.00585 • Published Mar 1, 2022 • 2
Emerging Properties in Self-Supervised Vision Transformers

Paper • 2104.14294 • Published Apr 29, 2021 • 3
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Paper • 2404.06903 • Published Apr 10 • 17
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30

Papers - ResNet

Wide Residual Networks

Paper • 1605.07146 • Published May 23, 2016 • 2
Characterizing signal propagation to close the performance gap in unnormalized ResNets

Paper • 2101.08692 • Published Jan 21, 2021 • 2
Pareto-Optimal Quantized ResNet Is Mostly 4-bit

Paper • 2105.03536 • Published May 7, 2021 • 2
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

Paper • 2106.01548 • Published Jun 3, 2021 • 2

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 36
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 19

Multi-agent LLMs

LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4 • 36
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Paper • 2402.11550 • Published Feb 18 • 15
A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published Apr 22 • 20

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs