Alex Martin's picture

2 19 3

Alex Martin

alexmartin1722

·

alexmartin1722

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

SmolVLM: Redefining small and efficient multimodal models

upvoted a paper 3 days ago

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

upvoted a paper 3 days ago

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

View all activity

Organizations

alexmartin1722's activity

upvoted 3 papers 3 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 7 days ago • 157

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Paper • 2504.05541 • Published 7 days ago • 14

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published 4 days ago • 18

upvoted a collection 5 days ago

Kimi-VL-A3B

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 2 days ago • 58

upvoted a collection 10 days ago

MultiVENT and MAGMAR Resources

Resources associated with the MultiVENT datasets, MAGMAR workshop, and other video retrieval and multimodal retrieval augmented generation • 5 items • Updated 10 days ago • 1

upvoted a paper 10 days ago

WikiVideo: Article Generation from Multiple Videos

Paper • 2504.00939 • Published 13 days ago • 36

upvoted a paper about 1 month ago

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

Paper • 2503.04973 • Published Mar 6 • 23

upvoted a collection about 1 month ago

LLaVA-OneVision

a model good at arbitrary types of visual input • 15 items • Updated Oct 5, 2024 • 24

upvoted 2 papers about 2 months ago

Rank1: Test-Time Compute for Reranking in Information Retrieval

Paper • 2502.18418 • Published Feb 25 • 26

Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

Paper • 2502.13962 • Published Feb 19 • 28

upvoted 2 papers 3 months ago

Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion

Paper • 2501.09019 • Published Jan 15 • 12

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published Jan 14 • 15

upvoted 2 papers 4 months ago

Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

Paper • 2412.13171 • Published Dec 17, 2024 • 35

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 96

upvoted a paper 6 months ago

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

Paper • 2410.08968 • Published Oct 11, 2024 • 12

upvoted a collection 7 months ago

Molmo

Artifacts for open multimodal language models. • 5 items • Updated Mar 13 • 301

upvoted a paper 7 months ago

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Paper • 2409.11136 • Published Sep 17, 2024 • 24

upvoted a paper 8 months ago

Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling

Paper • 2408.03695 • Published Aug 7, 2024 • 13

upvoted a paper over 1 year ago

Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

Paper • 2308.07316 • Published Aug 14, 2023 • 7