lioushz

Shz

AI & ML interests

None yet

Recent Activity

upvoted a paper 16 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

updated a dataset 17 days ago

opencompass/AIME2025

updated a dataset 17 days ago

Shz/aime_tmp

View all activity

Organizations

Shz's activity

upvoted a paper 16 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 17 days ago • 69

updated 2 datasets 17 days ago

opencompass/AIME2025

Viewer • Updated 17 days ago • 30 • 2.61k • 10

Shz/aime_tmp

Viewer • Updated 17 days ago • 30 • 59

published a dataset 17 days ago

Shz/aime_tmp

Viewer • Updated 17 days ago • 30 • 59

published a model 20 days ago

Shz/DeepSeek-R1-Distill-Qwen-1.5B-GRPO

Updated 20 days ago

liked a dataset about 1 month ago

opencompass/AIME2025

Viewer • Updated 17 days ago • 30 • 2.61k • 10

published a dataset about 1 month ago

opencompass/AIME2025

Viewer • Updated 17 days ago • 30 • 2.61k • 10

liked a dataset 2 months ago

opencompass/LiveMathBench

Viewer • Updated 16 days ago • 283 • 1.02k • 4

upvoted a paper 3 months ago

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published Dec 17, 2024 • 92

updated a dataset 4 months ago

opencompass/mmmlu_lite

Viewer • Updated Nov 1, 2024 • 20k • 213 • 2

liked a dataset 4 months ago

opencompass/mmmlu_lite

Viewer • Updated Nov 1, 2024 • 20k • 213 • 2

upvoted a paper 5 months ago

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 60

liked a Space 5 months ago

101

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark

upvoted a paper 6 months ago

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 42

liked a dataset 7 months ago

MU-NLPC/Calc-gsm8k

Viewer • Updated Oct 30, 2023 • 17.6k • 868 • 5

upvoted a paper 8 months ago

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

Paper • 2407.11963 • Published Jul 16, 2024 • 44

liked a Space 8 months ago

4.33k

OpenGPT 4o

🔥

GPT 4o like bot.

upvoted 2 papers 9 months ago

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20, 2024 • 33

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published Jun 20, 2024 • 35

liked a model about 2 years ago

valhalla/bart-large-finetuned-squadv1

Question Answering • Updated Jun 14, 2021 • 560 • 7