Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published 5 days ago • 69
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published 16 days ago • 54
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Paper • 2406.06523 • Published 23 days ago • 48
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published 21 days ago • 48
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published 19 days ago • 21
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Paper • 2406.08451 • Published 21 days ago • 23
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published 21 days ago • 28
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper • 2406.10149 • Published 19 days ago • 47
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Paper • 2406.09961 • Published 19 days ago • 54
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published 19 days ago • 74
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published 20 days ago • 85
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers Paper • 2406.10163 • Published 19 days ago • 27
VideoLLM-online: Online Video Large Language Model for Streaming Video Paper • 2406.11816 • Published 16 days ago • 20
How Do Large Language Models Acquire Factual Knowledge During Pretraining? Paper • 2406.11813 • Published 16 days ago • 28
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation Paper • 2406.10996 • Published 17 days ago • 31
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published 16 days ago • 39
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published 16 days ago • 60
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published 22 days ago • 17
Explore the Limits of Omni-modal Pretraining at Scale Paper • 2406.09412 • Published 20 days ago • 10
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus Paper • 2406.08707 • Published 20 days ago • 14
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published 21 days ago • 15
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts Paper • 2406.09162 • Published 20 days ago • 13
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published 20 days ago • 17
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding Paper • 2406.09411 • Published 20 days ago • 17
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published 20 days ago • 23
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published 22 days ago • 35
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published 20 days ago • 18
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published 20 days ago • 28
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published 20 days ago • 47
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published 27 days ago • 36
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 27 days ago • 69
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 29 items • Updated 27 days ago • 226
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published May 30 • 26
4Diffusion: Multi-view Video Diffusion Model for 4D Generation Paper • 2405.20674 • Published May 31 • 10
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models Paper • 2405.20541 • Published May 30 • 18
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Paper • 2405.21048 • Published May 31 • 11
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Paper • 2405.21075 • Published May 31 • 15
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 60
Contextual Position Encoding: Learning to Count What's Important Paper • 2405.18719 • Published May 29 • 3
On the Planning Abilities of Large Language Models -- A Critical Investigation Paper • 2305.15771 • Published May 25, 2023 • 1
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published May 20 • 25
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 22 items • Updated May 31 • 355
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published May 20 • 23
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 44
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116