limingcv (Ming Li)

upvoted 2 papers about 20 hours ago

Depth Anything V2

Paper • 2406.09414 • Published 1 day ago • 55

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published 1 day ago • 33

upvoted a paper 4 days ago

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published 4 days ago • 53

upvoted 2 papers 7 days ago

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published 8 days ago • 25

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published 8 days ago • 62

upvoted a paper 16 days ago

Phased Consistency Model

Paper • 2405.18407 • Published 17 days ago • 43

upvoted a paper 18 days ago

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published 18 days ago • 75

upvoted 2 papers about 1 month ago

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 105

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 116

upvoted 4 papers about 2 months ago

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25 • 49

upvoted an article about 2 months ago

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 250

upvoted 16 papers 2 months ago

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 46

HGRN2: Gated Linear RNNs with State Expansion

Paper • 2404.07904 • Published Apr 11 • 16

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 80

Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11 • 10

Best Practices and Lessons Learned on Synthetic Data for Language Models

Paper • 2404.07503 • Published Apr 11 • 24

Sparse Laneformer

Paper • 2404.07821 • Published Apr 11 • 9

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Paper • 2404.07544 • Published Apr 11 • 15

LLoCO: Learning Long Contexts Offline

Paper • 2404.07979 • Published Apr 11 • 15

Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11 • 14

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 40

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 28

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 32

Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Paper • 2404.07448 • Published Apr 11 • 10

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 41

UniFL: Improve Stable Diffusion via Unified Feedback Learning

Paper • 2404.05595 • Published Apr 8 • 22

ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Paper • 2404.04860 • Published Apr 7 • 24

upvoted 3 papers 3 months ago

When Do We Not Need Larger Vision Models?

Paper • 2403.13043 • Published Mar 19 • 24

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Paper • 2403.05121 • Published Mar 8 • 17

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 51

upvoted 13 papers 4 months ago

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 30

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23 • 32

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Paper • 2402.10009 • Published Feb 15 • 18

Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 18

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

Paper • 2402.10128 • Published Feb 15 • 14

Rolling Diffusion Models

Paper • 2402.09470 • Published Feb 12 • 8

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Paper • 2402.09812 • Published Feb 15 • 11

Generative Representational Instruction Tuning

Paper • 2402.09906 • Published Feb 15 • 50

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15 • 33

How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15 • 34

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15 • 35

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Paper • 2402.10193 • Published Feb 15 • 17

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 91

upvoted 3 papers 5 months ago

DiffusionGPT: LLM-Driven Text-to-Image Generation System

Paper • 2401.10061 • Published Jan 18 • 26

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18 • 13

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135

upvoted a paper 6 months ago

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Paper • 2312.09390 • Published Dec 14, 2023 • 32

upvoted a paper 11 months ago

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

Paper • 2307.11077 • Published Jul 20, 2023 • 1

Ming Li

AI & ML interests

Organizations

limingcv's activity

Welcome Llama 3 - Meta's new open LLM