Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation Paper • 2311.12028 • Published Nov 20, 2023 • 1
The Amazon Nova Family of Models: Technical Report and Model Card Paper • 2506.12103 • Published Mar 17, 2025 • 1
H_{2}OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers Paper • 2509.06956 • Published Sep 8, 2025 • 1
Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance Paper • 2510.10444 • Published Oct 17, 2025 • 1
Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation Paper • 2602.11440 • Published Feb 11 • 1
Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm Paper • 2303.07910 • Published Mar 14, 2023 • 2
Making Vision Transformers Efficient from A Token Sparsification View Paper • 2303.08685 • Published Mar 15, 2023 • 1
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment Paper • 2307.12964 • Published Jul 24, 2023 • 1
Revisiting Vision Transformer from the View of Path Ensemble Paper • 2308.06548 • Published Aug 12, 2023 • 1
Hallucination of Multimodal Large Language Models: A Survey Paper • 2404.18930 • Published Apr 29, 2024 • 1
FlexDiT: Dynamic Token Density Control for Diffusion Transformer Paper • 2412.06028 • Published Dec 8, 2024 • 1
Selective Structured State-Spaces for Long-Form Video Understanding Paper • 2303.14526 • Published Mar 25, 2023 • 1
Social Structure Matters in 3D Human-Human Interaction Generation Paper • 2606.24255 • Published 10 days ago • 1
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels Paper • 2309.08513 • Published Sep 15, 2023 • 3
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos Paper • 2409.19603 • Published Sep 29, 2024 • 19
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos Paper • 2409.19603 • Published Sep 29, 2024 • 19