Computer Vision - a PeppePasti Collection

PeppePasti 's Collections

LLMs

Multimodal LLMs

RAG

Agents

Reinforcement learning (RL)

Liquid Neural Networks

Diffusion Models

Text Embedding & Rankers

Computer Vision

Multi-lingual Training Language Models

NLP (no LLM related)

Interesting Stuffs

Computer Vision

updated Sep 27, 2024

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published Sep 3, 2024 • 36
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

Paper • 2409.03643 • Published Sep 5, 2024 • 19
UniDet3D: Multi-dataset Indoor 3D Object Detection

Paper • 2409.04234 • Published Sep 6, 2024 • 9
Evaluating Multiview Object Consistency in Humans and Image Models

Paper • 2409.05862 • Published Sep 9, 2024 • 10
LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation

Paper • 2409.06703 • Published Sep 10, 2024 • 3
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Paper • 2409.07452 • Published Sep 11, 2024 • 20
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

Paper • 2409.07441 • Published Sep 11, 2024 • 12
InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published Sep 13, 2024 • 33
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Paper • 2409.16160 • Published Sep 24, 2024 • 33
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Paper • 2409.18124 • Published Sep 26, 2024 • 32