Spaces:

lovodkin93
/

FuseReviews-Leaderboard

Sleeping

lovodkin93 commited on Mar 18, 2024

Commit

bfeafb5

•

1 Parent(s): 731c118

Upload 2 files

Files changed (2) hide show

app.py ADDED Viewed

+import gradio as gr
+import pandas as pd
+# df = pd.read_table("visit_bench_leaderboard.tsv")
+df = pd.read_table('visitbench_leaderboard_Single~Image_Nov072023.tsv')
+headline = """# VisIT-Bench Leaderboard
+To submit your results to the leaderboard, you can run our auto-evaluation code, following the instructions [here](https://github.com/Hritikbansal/visit_bench_sandbox). Once you are happy with the results, you can send to [this mail](mailto:yonatanbitton1@gmail.com).
+Please include in your email 1) a name for your model, 2) your team name (including your affiliation), and optionally, 3) a github repo or paper link. Please also attach your predictions: you can add a "predictions" column to [this csv](https://huggingface.co/datasets/mlfoundations/VisIT-Bench/raw/main/test/metadata.csv).
+"""
+demo = gr.Blocks()
+with demo:
+    with gr.Row():
+        gr.Markdown(headline)
+    with gr.Column():
+        leaderboard_df = gr.components.DataFrame(
+            value=df,
+            datatype=["markdown", "markdown", "number", "number", "number"]
+        )
+demo.launch()

visit_bench_leaderboard.tsv ADDED Viewed

+Category	Model	Elo	matches	Win vs. Reference  (w/ # ratings)
+Single Image	Human Verified GPT-4 Reference	1370	5442	-
+Single Image	LLaVA (13B)	1106	5446	17.81% (n=494)
+Single Image	LlamaAdapter-v2 (7B)	1082	5445	13.75% (n=502)
+Single Image	mPLUG-Owl (7B)	1081	5452	15.29% (n=497)
+Single Image	InstructBLIP (13B)	1011	5444	13.73% (n=517)
+Single Image	Otter (9B)	991	5450	6.84% (n=512)
+Single Image	VisualGPT (Da Vinci 003)	972	5445	1.52% (n=527)
+Single Image	MiniGPT-4 (7B)	921	5442	3.26% (n=522)
+Single Image	OpenFlamingo (9B)	877	5449	2.86% (n=524)
+Single Image	PandaGPT (13B)	826	5441	2.63% (n=533)
+Single Image	Multimodal GPT	763	5450	0.18% (n=544)
+Multiple Images	Human Verified GPT-4 Reference	1192	180	-
+Multiple Images	mPLUG-Owl	995	180	6.67% (n=60)
+Multiple Images	Otter	911	180	1.69% (n=59)
+Multiple Images	OpenFlamingo	902	180	1.67% (n=60)