Vince's picture

371 31

Vince

bolerovt

·

bolerovt

AI & ML interests

None yet

Organizations

None yet

bolerovt's activity

upvoted 10 papers 7 days ago

LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models

Paper • 2405.18377 • Published 13 days ago • 14

2BP: 2-Stage Backpropagation

Paper • 2405.18047 • Published 13 days ago • 21

Yuan 2.0-M32: Mixture of Experts with Attention Router

Paper • 2405.17976 • Published 13 days ago • 18

Phased Consistency Model

Paper • 2405.18407 • Published 13 days ago • 42

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published 12 days ago • 41

Xwin-LM: Strong and Scalable Alignment Practice for LLMs

Paper • 2405.20335 • Published 11 days ago • 16

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published 11 days ago • 16

Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published 11 days ago • 22

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published 11 days ago • 26

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published 10 days ago • 57

upvoted 5 papers 13 days ago

CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

Paper • 2405.14979 • Published 18 days ago • 14

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published 18 days ago • 30

Transformers Can Do Arithmetic with the Right Embeddings

Paper • 2405.17399 • Published 14 days ago • 49

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Paper • 2405.17428 • Published 14 days ago • 13

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published 14 days ago • 71

upvoted 5 papers 14 days ago

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published 25 days ago • 25

INDUS: Effective and Efficient Language Models for Scientific Applications

Paper • 2405.10725 • Published 24 days ago • 23

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published 22 days ago • 53

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published 17 days ago • 42

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published 17 days ago • 49

upvoted 7 papers 15 days ago

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published 20 days ago • 25

Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published 21 days ago • 25

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published 22 days ago • 138

Dense Connector for MLLMs

Paper • 2405.13800 • Published 19 days ago • 20

ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published 19 days ago • 21

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published 18 days ago • 27

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published 18 days ago • 35

upvoted 27 papers 20 days ago

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published 28 days ago • 15

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Paper • 2405.07518 • Published 28 days ago • 21

SUTRA: Scalable Multilingual Language Model Architecture

Paper • 2405.06694 • Published May 7 • 34

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published 28 days ago • 62

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 88

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

Paper • 2405.08317 • Published 27 days ago • 8

SpeechVerse: A Large-scale Generalizable Audio Language Model

Paper • 2405.08295 • Published 27 days ago • 10

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published 27 days ago • 10

Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published 27 days ago • 11

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published 27 days ago • 17

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published 28 days ago • 21

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published 27 days ago • 26

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Paper • 2405.09546 • Published 26 days ago • 9

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published 26 days ago • 23

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Paper • 2405.10315 • Published 25 days ago • 9

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published 25 days ago • 15

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published 25 days ago • 19

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published 25 days ago • 24

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published 25 days ago • 38

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published 26 days ago • 75

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 25 days ago • 101

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

Paper • 2405.10637 • Published 24 days ago • 17

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Paper • 2405.11157 • Published 23 days ago • 23

Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published 21 days ago • 22

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published 21 days ago • 23

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published 22 days ago • 33

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published 21 days ago • 42

upvoted 6 papers about 1 month ago

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

Paper • 2404.17521 • Published Apr 26 • 12

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published Apr 29 • 66

GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

Paper • 2404.19702 • Published Apr 30 • 17

Extending Llama-3's Context Ten-Fold Overnight

Paper • 2404.19553 • Published Apr 30 • 30

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 64

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 68