2 15

Youngtaek Oh

ytaek-oh

https://ytaek-oh.github.io

AI & ML interests

Vision and Language, Multimodality, Compositionality

Recent Activity

updated a Space about 2 months ago

ytaek-oh/table

published a Space about 2 months ago

ytaek-oh/table

updated a collection 3 months ago

VLM Papers

View all activity

Organizations

None yet

ytaek-oh's activity

updated a Space about 2 months ago

Table

⚡

Display tables in a web app

published a Space about 2 months ago

Table

⚡

Display tables in a web app

updated a collection 3 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

upvoted a paper 4 months ago

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27, 2024 • 46

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

upvoted a paper 4 months ago

jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images

Paper • 2412.08802 • Published Dec 11, 2024 • 5

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

upvoted a paper 4 months ago

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Paper • 2412.09283 • Published Dec 12, 2024 • 19

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

upvoted a paper 4 months ago

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Paper • 2412.09616 • Published Dec 12, 2024 • 1

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

upvoted a paper 4 months ago

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Paper • 2412.08580 • Published Dec 11, 2024 • 45

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

upvoted a paper 4 months ago

FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training

Paper • 2411.11927 • Published Nov 18, 2024 • 1

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

upvoted a paper 4 months ago

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

Paper • 2411.16828 • Published Nov 25, 2024 • 1

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2

upvoted a paper 4 months ago

COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training

Paper • 2412.01814 • Published Dec 2, 2024 • 1

updated a collection 4 months ago

VLM Papers

Collection

Save it, later read • 14 items • Updated Dec 31, 2024 • 2