Papers to read - a vladbogo Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

vladbogo 's Collections

AI Paper of the Day

LLMs

Vision

Papers to read

updated Sep 10

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Paper • 2402.08714 • Published Feb 13 • 11
Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 23
RLVF: Learning from Verbal Feedback without Overgeneralization

Paper • 2402.10893 • Published Feb 16 • 10
Coercing LLMs to do and reveal (almost) anything

Paper • 2402.14020 • Published Feb 21 • 12
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

Paper • 2402.14658 • Published Feb 22 • 82
TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22 • 19
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 112
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Paper • 2402.16822 • Published Feb 26 • 15
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 49
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5 • 16
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Paper • 2402.11753 • Published Feb 19 • 5
How Far Are We from Intelligent Visual Deductive Reasoning?

Paper • 2403.04732 • Published Mar 7 • 19
Evaluating and Mitigating Discrimination in Language Model Decisions

Paper • 2312.03689 • Published Dec 6, 2023 • 1
How predictable is language model benchmark performance?

Paper • 2401.04757 • Published Jan 9 • 2
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Paper • 2305.02547 • Published May 4, 2023 • 7
Is Cosine-Similarity of Embeddings Really About Similarity?

Paper • 2403.05440 • Published Mar 8 • 3
Multistep Consistency Models

Paper • 2403.06807 • Published Mar 11 • 14
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

Paper • 2402.18216 • Published Feb 28 • 1
V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11 • 28
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 157
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 67
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25 • 25
Can large language models explore in-context?

Paper • 2403.15371 • Published Mar 22 • 32
DreamLIP: Language-Image Pre-training with Long Captions

Paper • 2403.17007 • Published Mar 25 • 1
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Paper • 2403.20331 • Published Mar 29 • 14
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29 • 25
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 65
Stream of Search (SoS): Learning to Search in Language

Paper • 2404.03683 • Published Apr 1 • 29
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8 • 20
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 104
Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations

Paper • 2404.07153 • Published Apr 10 • 1
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 47
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15 • 20
MeshLRM: Large Reconstruction Model for High-Quality Mesh

Paper • 2404.12385 • Published Apr 18 • 26
TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published Apr 19 • 29
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19 • 30
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Paper • 2404.14396 • Published Apr 22 • 18
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published Apr 22 • 44
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22 • 21
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 71
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published Apr 29 • 118
Corrective Retrieval Augmented Generation

Paper • 2401.15884 • Published Jan 29 • 3
Observational Scaling Laws and the Predictability of Language Model Performance

Paper • 2405.10938 • Published May 17 • 11
Your Transformer is Secretly Linear

Paper • 2405.12250 • Published May 19 • 149
Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published May 20 • 27
LANISTR: Multimodal Learning from Structured and Unstructured Data

Paper • 2305.16556 • Published May 26, 2023 • 2
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Paper • 2405.11273 • Published May 18 • 17
Not All Language Model Features Are Linear

Paper • 2405.14860 • Published May 23 • 39
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published May 24 • 25
Phased Consistency Model

Paper • 2405.18407 • Published May 28 • 46
MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30 • 19
Xwin-LM: Strong and Scalable Alignment Practice for LLMs

Paper • 2405.20335 • Published May 30 • 17
LLMs achieve adult human performance on higher-order theory of mind tasks

Paper • 2405.18870 • Published May 29 • 17
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Paper • 2406.00888 • Published Jun 2 • 30
Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4 • 15
Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3 • 18
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Paper • 2406.02884 • Published Jun 5 • 15
Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 37
Proofread: Fixes All Errors with One Tap

Paper • 2406.04523 • Published Jun 6 • 12
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11 • 32
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12 • 24
Large Language Model Unlearning via Embedding-Corrupted Prompts

Paper • 2406.07933 • Published Jun 12 • 7
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50
TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11 • 27
Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14 • 76
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Paper • 2406.08973 • Published Jun 13 • 86
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published Jun 17 • 61
mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Paper • 2406.11839 • Published Jun 17 • 37
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17 • 30
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 65
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

Paper • 2406.12849 • Published Jun 18 • 49
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Paper • 2406.12034 • Published Jun 17 • 14
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Paper • 2406.10601 • Published Jun 15 • 65
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Paper • 2406.14972 • Published Jun 21 • 7
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

Paper • 2406.13457 • Published Jun 19 • 16
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers

Paper • 2406.12430 • Published Jun 18 • 7
Weight subcloning: direct initialization of transformers using larger pretrained ones

Paper • 2312.09299 • Published Dec 14, 2023 • 17
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

Paper • 2406.17419 • Published Jun 25 • 16
Large Language Models Assume People are More Rational than We Really are

Paper • 2406.17055 • Published Jun 24 • 4
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published Jun 24 • 54
Video-Infinity: Distributed Long Video Generation

Paper • 2406.16260 • Published Jun 24 • 28
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24 • 25
Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published Jun 24 • 67
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27 • 52
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Paper • 2406.20095 • Published Jun 28 • 17
MagMax: Leveraging Model Merging for Seamless Continual Learning

Paper • 2407.06322 • Published Jul 8 • 1
A Single Transformer for Scalable Vision-Language Modeling

Paper • 2407.06438 • Published Jul 8 • 1
Video Diffusion Alignment via Reward Gradients

Paper • 2407.08737 • Published Jul 11 • 48
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Paper • 2407.09025 • Published Jul 12 • 129
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Paper • 2407.09121 • Published Jul 12 • 5
GAVEL: Generating Games Via Evolution and Language Models

Paper • 2407.09388 • Published Jul 12 • 14
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Paper • 2407.08726 • Published Jul 11 • 8
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Paper • 2407.08303 • Published Jul 11 • 17
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16 • 43
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?

Paper • 2402.07282 • Published Feb 11 • 1
Fewer Truncations Improve Language Modeling

Paper • 2404.10830 • Published Apr 16 • 3
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Paper • 2407.00256 • Published Jun 28 • 1
Provably Robust DPO: Aligning Language Models with Noisy Feedback

Paper • 2403.00409 • Published Mar 1 • 1
Efficient Exploration for LLMs

Paper • 2402.00396 • Published Feb 1 • 21
Text2SQL is Not Enough: Unifying AI and Databases with TAG

Paper • 2408.14717 • Published Aug 27 • 24
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published Sep 6 • 23

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs