Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.18361

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 21
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 9
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 31
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 19

vision foundation modesl

vision foundation models

about 11 hours ago

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10 • 14
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 24
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 12 days ago • 92

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25 • 24
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 29
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Paper • 2404.01367 • Published Apr 1 • 19

transformer-techniques

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 101
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29 • 34
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28 • 99
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 31
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 40

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48

Papers - Encoders - Synthetic Noise

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48

Papers - Image - Generator - Large Resolution

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs