Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.18869

Perception and abstraction. Each modality is tokenized and embedded into vectors for model to comprehend.

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24 • 39
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30 • 116
Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26 • 47
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28 • 42

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 59
PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 68
E5-V: Universal Embeddings with Multimodal Large Language Models

Paper • 2407.12580 • Published Jul 17 • 39
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 93

Papers I want to read

Papers in my to-read list

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 66
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 126
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 87

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 12
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 87
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 31

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Paper • 2311.17049 • Published Nov 28, 2023 • 1
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7 • 14
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Paper • 2303.17376 • Published Mar 30, 2023
Sigmoid Loss for Language Image Pre-Training

Paper • 2303.15343 • Published Mar 27, 2023 • 5

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

about 10 hours ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 125
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19 • 50
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6 • 12
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 65

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29 • 37
Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23 • 86
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

Paper • 2307.04725 • Published Jul 10, 2023 • 64
Boximator: Generating Rich and Controllable Motions for Video Synthesis

Paper • 2402.01566 • Published Feb 2 • 26

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs