Josiah Aklilu

josaklil-ai

https://josaklil-ai.github.io/

AI & ML interests

computer vision & language for enhancing surgical practice

Recent Activity

upvoted a paper 15 days ago

SmolVLM: Redefining small and efficient multimodal models

upvoted a paper about 1 month ago

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

upvoted a paper about 1 month ago

Video Action Differencing

View all activity

Organizations

None yet

josaklil-ai's activity

upvoted a paper 15 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 16 days ago • 170

upvoted 2 papers about 1 month ago

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

Paper • 2503.13399 • Published Mar 17 • 21

Video Action Differencing

Paper • 2503.07860 • Published Mar 10 • 33

upvoted a collection about 2 months ago

Temporal Preference Optimization

Collection

Temporal Preference Optimization for Long-form Video Understanding • 3 items • Updated Jan 19 • 5

updated a dataset 2 months ago

josaklil-ai/s1K

Viewer • Updated Feb 13 • 1.89k • 12

published a dataset 2 months ago

josaklil-ai/s1K

Viewer • Updated Feb 13 • 1.89k • 12

updated a dataset 2 months ago

josaklil-ai/s50K

Viewer • Updated Feb 13 • 50.4k • 26

published a dataset 2 months ago

josaklil-ai/s50K

Viewer • Updated Feb 13 • 50.4k • 26

upvoted a paper 3 months ago

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23 • 22

authored 2 papers 3 months ago

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published Jan 13 • 56

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

Paper • 2501.03225 • Published Jan 6 • 7

upvoted a paper 3 months ago

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Paper • 2501.07171 • Published Jan 13 • 56

upvoted a paper 4 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

authored a paper 10 months ago

Revisiting Active Learning in the Era of Vision Foundation Models

Paper • 2401.14555 • Published Jan 25, 2024