Multimodal Large Language Models

university

https://bzhao.me/

AI & ML interests

None defined yet.

Recent Activity

tennant authored a paper 20 days ago

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

tennant authored a paper 20 days ago

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

tennant authored a paper 3 months ago

A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?

View all activity

mllms's activity

tennant

authored 2 papers 20 days ago

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

Paper • 2405.21070 • Published May 31, 2024

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

Paper • 2411.16828 • Published Nov 25, 2024 • 1

tennant

authored a paper 3 months ago

A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?

Paper • 2409.15277 • Published Sep 23, 2024 • 35

tennant

authored 11 papers 7 months ago

Parametric Classification for Generalized Category Discovery: A Baseline Study

Paper • 2211.11727 • Published Nov 21, 2022 • 1

Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery

Paper • 2305.06144 • Published May 10, 2023 • 1

Improving Contrastive Learning by Visualizing Feature Transformation

Paper • 2108.02982 • Published Aug 6, 2021

Self-Supervised Visual Representation Learning with Semantic Grouping

Paper • 2205.15288 • Published May 30, 2022

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

Paper • 2311.16101 • Published Nov 27, 2023 • 1

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

Paper • 2312.11420 • Published Dec 18, 2023 • 2

Compress & Align: Curating Image-Text Data with Human Knowledge

Paper • 2312.06726 • Published Dec 11, 2023

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 32

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 12

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Paper • 2406.12742 • Published Jun 18, 2024 • 14

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12, 2024 • 39

tennant

authored a paper over 1 year ago

Incremental Generalized Category Discovery

Paper • 2304.14310 • Published Apr 27, 2023