matlok (Welcome to matlok)

upvoted 13 papers about 2 hours ago

HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting

Paper • 2405.15125 • Published 4 days ago • 3

Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Paper • 2405.15216 • Published 3 days ago • 4

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published 3 days ago • 5

AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct

Paper • 2405.14906 • Published 5 days ago • 5

Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining

Paper • 2405.14908 • Published 4 days ago • 6

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published 3 days ago • 7

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published 3 days ago • 8

CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

Paper • 2405.14979 • Published 4 days ago • 8

Aya 23: Open Weight Releases to Further Multilingual Progress

Paper • 2405.15032 • Published 4 days ago • 11

The Road Less Scheduled

Paper • 2405.15682 • Published 3 days ago • 10

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published 4 days ago • 17

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published 3 days ago • 26

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published 3 days ago • 28

upvoted a paper about 4 hours ago

Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models

Paper • 2312.02969 • Published Dec 5, 2023 • 12

upvoted 4 papers 2 days ago

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published 19 days ago • 7

NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections

Paper • 2405.14871 • Published 4 days ago • 5

Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras

Paper • 2405.14866 • Published 4 days ago • 5

Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling

Paper • 2405.14847 • Published 4 days ago • 6

upvoted 14 papers 3 days ago

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Paper • 2405.13195 • Published 6 days ago • 6

Semantica: An Adaptable Image-Conditioned Diffusion Model

Paper • 2405.14857 • Published 4 days ago • 5

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Paper • 2405.14333 • Published 4 days ago • 22

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Paper • 2405.14477 • Published 4 days ago • 13

Improved Distribution Matching Distillation for Fast Image Synthesis

Paper • 2405.14867 • Published 4 days ago • 9

Dense Connector for MLLMs

Paper • 2405.13800 • Published 5 days ago • 15

Not All Language Model Features Are Linear

Paper • 2405.14860 • Published 4 days ago • 28

ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published 5 days ago • 19

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published 4 days ago • 9

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

Paper • 2405.14677 • Published 4 days ago • 8

upvoted 7 papers 5 days ago

Images that Sound: Composing Images and Sounds on a Single Canvas

Paper • 2405.12221 • Published 7 days ago • 1

Personalized Residuals for Concept-Driven Text-to-Image Generation

Paper • 2405.12978 • Published 6 days ago • 8

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Paper • 2405.12979 • Published 6 days ago • 7

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published 6 days ago • 20

Diffusion for World Modeling: Visual Details Matter in Atari

Paper • 2405.12399 • Published 7 days ago • 25

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published 6 days ago • 22

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published 8 days ago • 116

upvoted 8 papers 6 days ago

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Paper • 2405.11582 • Published 8 days ago • 10

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Paper • 2405.11252 • Published 9 days ago • 11

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Paper • 2405.11157 • Published 10 days ago • 22

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published 8 days ago • 48

Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published 7 days ago • 22

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published 8 days ago • 31

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published 7 days ago • 21

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published 7 days ago • 37

upvoted a paper 9 days ago

An Empirical Evaluation of Columnar Storage Formats

Paper • 2304.05028 • Published Apr 11, 2023 • 1

upvoted 8 papers 10 days ago

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published 11 days ago • 14

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Paper • 2405.10315 • Published 11 days ago • 9

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published 11 days ago • 19

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published 11 days ago • 37

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published 11 days ago • 24

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published 11 days ago • 22

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published 12 days ago • 70

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 11 days ago • 91

upvoted 3 papers 11 days ago

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Paper • 2405.09546 • Published 12 days ago • 9

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Paper • 2405.09215 • Published 12 days ago • 14

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published 12 days ago • 22

upvoted a paper 12 days ago

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published 13 days ago • 10

Welcome to matlok

AI & ML interests

Organizations

matlok's activity