CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published 3 days ago • 9
view article Article The CVPR Survival Guide: Discovering Research That's Interesting to YOU! By harpreetsahota • about 20 hours ago • 8
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 4 days ago • 26
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published 12 days ago • 16
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 4 days ago • 43
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 5 days ago • 54
Vript Collection A large-scale video-text dataset of high-resolution videos annotated with dense and detailed captions. • 8 items • Updated about 8 hours ago • 2
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published 8 days ago • 22
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published 8 days ago • 36
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published 9 days ago • 17
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 9 days ago • 62
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 9 days ago • 25
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 29 items • Updated 9 days ago • 178
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion Paper • 2406.03184 • Published 10 days ago • 16
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Paper • 2401.16158 • Published Jan 29 • 16
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Paper • 2406.01014 • Published 12 days ago • 28
I4VGen: Image as Stepping Stone for Text-to-Video Generation Paper • 2406.02230 • Published 11 days ago • 14
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published 11 days ago • 25
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Paper • 2406.02511 • Published 11 days ago • 7
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published 12 days ago • 16
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published 12 days ago • 36
4Diffusion: Multi-view Video Diffusion Model for 4D Generation Paper • 2405.20674 • Published 15 days ago • 9
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Paper • 2405.21075 • Published 15 days ago • 14
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published 15 days ago • 60
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published 16 days ago • 27
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published 16 days ago • 17
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published 17 days ago • 20
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published 17 days ago • 43
view article Article CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models 22 days ago • 18
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published 22 days ago • 14
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published May 7 • 10
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct Paper • 2405.14906 • Published 23 days ago • 21
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 22 days ago • 42
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published 22 days ago • 51
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published 23 days ago • 21
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published 23 days ago • 28
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published 24 days ago • 22
InternVL Collection InternVL Family: A Pioneering Open-Source Alternative to GPT-4V • 24 items • Updated 16 days ago • 13
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published 27 days ago • 53
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published 26 days ago • 44
Imp-v1.5 Collection A series of Imp models with different LLM backbone. • 5 items • Updated 25 days ago • 4