fdsqefsgergd's picture

990 92

fdsqefsgergd

T-representer

·

AI & ML interests

None yet

Organizations

None yet

T-representer's activity

upvoted 11 papers about 20 hours ago

HelpSteer2: Open-source dataset for training top-performing reward models

Paper • 2406.08673 • Published 2 days ago • 11

Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published 1 day ago • 18

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Paper • 2406.09162 • Published 1 day ago • 8

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

Paper • 2406.09305 • Published 1 day ago • 4

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

Paper • 2406.09371 • Published 1 day ago • 3

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published 1 day ago • 17

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Paper • 2406.08552 • Published 3 days ago • 15

Interpreting the Weight Space of Customized Diffusion Models

Paper • 2406.09413 • Published 1 day ago • 15

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published 1 day ago • 22

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published 1 day ago • 33

Depth Anything V2

Paper • 2406.09414 • Published 1 day ago • 57

upvoted 33 papers 2 days ago

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Paper • 2406.08487 • Published 3 days ago • 10

Discovering Preference Optimization Algorithms with and for Large Language Models

Paper • 2406.08414 • Published 3 days ago • 12

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Paper • 2406.07686 • Published 3 days ago • 13

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Paper • 2406.08392 • Published 3 days ago • 15

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Paper • 2406.07792 • Published 3 days ago • 13

Are We Done with MMLU?

Paper • 2406.04127 • Published 9 days ago • 28

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published 4 days ago • 26

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published 3 days ago • 32

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Paper • 2406.06282 • Published 5 days ago • 31

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published 7 days ago • 35

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Paper • 2406.05629 • Published 6 days ago • 6

Simple and Effective Masked Diffusion Language Models

Paper • 2406.07524 • Published 4 days ago • 7

SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Paper • 2406.06612 • Published 8 days ago • 11

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Paper • 2406.06911 • Published 4 days ago • 10

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published 4 days ago • 14

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

Paper • 2406.06563 • Published 12 days ago • 16

The Prompt Report: A Systematic Survey of Prompting Techniques

Paper • 2406.06608 • Published 9 days ago • 29

TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published 4 days ago • 21

Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published 4 days ago • 27

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published 4 days ago • 42

McEval: Massively Multilingual Code Evaluation

Paper • 2406.07436 • Published 4 days ago • 37

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published 5 days ago • 19

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Paper • 2406.06469 • Published 5 days ago • 19

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

Paper • 2406.06216 • Published 5 days ago • 11

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Paper • 2406.05981 • Published 5 days ago • 10

Tx-LLM: A Large Language Model for Therapeutics

Paper • 2406.06316 • Published 5 days ago • 10

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published 7 days ago • 11

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Paper • 2406.06424 • Published 5 days ago • 8

Towards a Personal Health Large Language Model

Paper • 2406.06474 • Published 5 days ago • 10

MLCM: Multistep Consistency Distillation of Latent Diffusion Model

Paper • 2406.05768 • Published 6 days ago • 8

IllumiNeRF: 3D Relighting without Inverse Rendering

Paper • 2406.06527 • Published 5 days ago • 7

Unified Text-to-Image Generation and Retrieval

Paper • 2406.05814 • Published 6 days ago • 7

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published 5 days ago • 53

upvoted 6 papers 4 days ago

Mixture-of-Agents Enhances Large Language Model Capabilities

Paper • 2406.04692 • Published 8 days ago • 36

GenAI Arena: An Open Evaluation Platform for Generative Models

Paper • 2406.04485 • Published 8 days ago • 17

CRAG -- Comprehensive RAG Benchmark

Paper • 2406.04744 • Published 8 days ago • 23

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published 8 days ago • 22

Proofread: Fixes All Errors with One Tap

Paper • 2406.04523 • Published 8 days ago • 8

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

Paper • 2406.04520 • Published 8 days ago • 9

upvoted a paper 6 days ago

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published 9 days ago • 62

upvoted 6 papers 8 days ago

VideoTetris: Towards Compositional Text-to-Video Generation

Paper • 2406.04277 • Published 9 days ago • 19

SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published 9 days ago • 20

pOps: Photo-Inspired Diffusion Operators

Paper • 2406.01300 • Published 12 days ago • 15

On Architectural Compression of Text-to-Image Diffusion Models

Paper • 2305.15798 • Published May 25, 2023 • 3

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Paper • 2406.04333 • Published 9 days ago • 35

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published 9 days ago • 25

upvoted 3 papers 9 days ago

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Paper • 2406.01014 • Published 12 days ago • 28

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Paper • 2406.03184 • Published 10 days ago • 16

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published 11 days ago • 32