dashfunnydashdash (J)

upvoted a paper about 14 hours ago

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published 2 days ago • 11

upvoted a paper about 15 hours ago

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

Paper • 2405.20204 • Published 3 days ago • 17

upvoted a paper 1 day ago

Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published 3 days ago • 13

upvoted a paper 2 days ago

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published 3 days ago • 34

upvoted 4 papers 5 days ago

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published 6 days ago • 10

upvoted 6 papers 6 days ago

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Paper • 2405.15738 • Published 8 days ago • 41

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Paper • 2405.15071 • Published 9 days ago • 30

Aya 23: Open Weight Releases to Further Multilingual Progress

Paper • 2405.15032 • Published 9 days ago • 21

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published 9 days ago • 45

CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

Paper • 2405.14979 • Published 9 days ago • 13

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published 9 days ago • 11

upvoted a paper 9 days ago

ReVideo: Remake a Video with Motion and Content Control

Paper • 2405.13865 • Published 10 days ago • 19

upvoted 2 papers 12 days ago

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published 13 days ago • 33

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published 14 days ago • 50

upvoted 2 papers 14 days ago

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published 16 days ago • 19

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published 16 days ago • 37

upvoted 2 papers 16 days ago

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published 17 days ago • 25

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 17 days ago • 95

upvoted a paper 18 days ago

What matters when building vision-language models?

Paper • 2405.02246 • Published 29 days ago • 87

upvoted an article 19 days ago

Article

License to Call: Introducing Transformers Agents 2.0

20 days ago

• 89

upvoted 2 papers 30 days ago

LLM-AD: Large Language Model based Audio Description System

Paper • 2405.00983 • Published May 2 • 13

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published about 1 month ago • 44

upvoted 18 papers about 1 month ago

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28 • 20

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25 • 49

NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published Apr 24 • 11

Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published Apr 25 • 18

MotionMaster: Training-free Camera Motion Transfer For Video Generation

Paper • 2404.15789 • Published Apr 24 • 10

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22 • 23

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18

Pegasus-v1 Technical Report

Paper • 2404.14687 • Published Apr 23 • 29

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23 • 55

Learn2Talk: 3D Talking Face Learns from 2D Talking Face

Paper • 2404.12888 • Published Apr 19 • 2

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 238

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published Apr 19 • 27

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Paper • 2404.13026 • Published Apr 19 • 21

Does Gaussian Splatting need SFM Initialization?

Paper • 2404.12547 • Published Apr 18 • 8

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

Paper • 2404.11565 • Published Apr 17 • 12

BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18 • 23

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Paper • 2404.11912 • Published Apr 18 • 16

Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16 • 23

upvoted 17 papers about 2 months ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Paper • 2404.09833 • Published Apr 15 • 27

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15 • 27

Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior

Paper • 2404.06780 • Published Apr 10 • 9

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 80

OmniFusion Technical Report

Paper • 2404.06212 • Published Apr 9 • 73

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10 • 22

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 46

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 58

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4 • 21

PointInfinity: Resolution-Invariant Point Diffusion Models

Paper • 2404.03566 • Published Apr 4 • 13

Training LLMs over Neurally Compressed Text

Paper • 2404.03626 • Published Apr 4 • 21

GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image

Paper • 2404.02152 • Published Apr 2 • 3

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

Paper • 2404.02514 • Published Apr 3 • 9

Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes

Paper • 2404.01543 • Published Apr 2 • 3

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2 • 41

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29 • 22

J PRO

AI & ML interests

Organizations

dashfunnydashdash's activity

License to Call: Introducing Transformers Agents 2.0