studyOverflow commited on
Commit
1357e34
·
verified ·
1 Parent(s): f103e62

init: minimal Gradio annotation template (6 models, 3504 items)

Browse files
Files changed (3) hide show
  1. README.md +24 -7
  2. app.py +326 -0
  3. requirements.txt +2 -0
README.md CHANGED
@@ -1,14 +1,31 @@
1
  ---
2
- title: MBenchAnnotation
3
- emoji:
4
- colorFrom: yellow
5
  colorTo: purple
6
  sdk: gradio
7
- sdk_version: 6.13.0
8
  app_file: app.py
9
  pinned: false
10
- license: mit
11
- short_description: A space for annotation of MemoryBench
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MBench Annotation
3
+ emoji: 🎬
4
+ colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
 
 
10
  ---
11
 
12
+ # MBench-V Human Annotation
13
+
14
+ Gradio-based annotation UI for the MBench-V video generation benchmark.
15
+
16
+ - **Video source (read-only)**: [studyOverflow/TempMemoryData](https://huggingface.co/datasets/studyOverflow/TempMemoryData), streamed directly from HF CDN — videos are **not** copied into this Space.
17
+ - **Annotation sink (write)**: the same dataset repo, under `annotations/`. Submissions are batched by `CommitScheduler` and pushed every 5 minutes.
18
+ - **Models included (6)**: `causal_forcing`, `self_forcing`, `cosmos`, `helios`, `longlive`, `memflow`. `skyreels` and `longcat` are temporarily excluded because their 0422 generation is still in progress.
19
+ - **Tasks**: 584 task_ids × 6 models = **3504** `(model, task_id)` pairs.
20
+
21
+ ## How to use
22
+
23
+ 1. Enter your annotator name (anything unique — used to tag your submissions).
24
+ 2. Watch the video on the left; read the prompt and metadata in the middle.
25
+ 3. Give a score (1–5) and an optional note on the right.
26
+ 4. Click **Submit & Next** to move on. Your submissions are auto-committed every 5 min.
27
+
28
+ ## Notes
29
+
30
+ - This is a minimal template. Multi-annotator deduplication, per-user task-allocation, and per-dimension scoring are **not** implemented yet — all annotators currently get a randomly shuffled pool and see tasks in their own order.
31
+ - The environment variable `HF_TOKEN` must be set in the Space *Settings → Variables and secrets* with **write** access to `studyOverflow/TempMemoryData`.
app.py ADDED
@@ -0,0 +1,326 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MBench-V annotation UI (Gradio Space).
2
+
3
+ Reads videos streaming from the `studyOverflow/TempMemoryData` dataset repo,
4
+ writes annotations back to the same repo under `annotations/`, batched via
5
+ `CommitScheduler`.
6
+
7
+ Design notes
8
+ ------------
9
+ - Videos are NOT copied into this Space. We build CDN URLs with
10
+ `hf_hub_url(..., repo_type="dataset")` and let the browser stream them.
11
+ - Submissions are appended to a per-process JSONL file under `annotations/`;
12
+ `CommitScheduler` pushes the directory to the dataset repo every 5 min.
13
+ - Allocation is intentionally simple in this template: at start-up we build
14
+ a single shuffled pool of `(model, task_id)` pairs, and each user session
15
+ maintains its own index into that pool. Multi-annotator deduplication is
16
+ out of scope for the first iteration.
17
+ """
18
+
19
+ from __future__ import annotations
20
+
21
+ import json
22
+ import os
23
+ import random
24
+ import time
25
+ import uuid
26
+ from pathlib import Path
27
+ from typing import Any
28
+
29
+ import gradio as gr
30
+ from huggingface_hub import CommitScheduler, hf_hub_download, hf_hub_url
31
+
32
+ # ---------------------------------------------------------------------------
33
+ # Config
34
+ # ---------------------------------------------------------------------------
35
+
36
+ DATASET_REPO = "studyOverflow/TempMemoryData"
37
+ MERGED_JSON_PATH = "MBench-V/merged.json"
38
+
39
+ # 6 models that are already fully reorganized on HF (584 videos each).
40
+ # `skyreels` and `longcat` are excluded until their 0422 runs finish.
41
+ MODELS: list[str] = [
42
+ "causal_forcing",
43
+ "self_forcing",
44
+ "cosmos",
45
+ "helios",
46
+ "longlive",
47
+ "memflow",
48
+ ]
49
+
50
+ HF_TOKEN = os.environ.get("HF_TOKEN") # must be set in Space secrets for writes
51
+
52
+ # Local staging directory that CommitScheduler will sync to the dataset repo.
53
+ ANN_DIR = Path("annotations_local")
54
+ ANN_DIR.mkdir(exist_ok=True)
55
+
56
+ # Each Space process writes to its own JSONL so concurrent replicas don't
57
+ # clobber each other's writes. `CommitScheduler` pushes the whole directory.
58
+ PROCESS_ID = uuid.uuid4().hex[:8]
59
+ ANN_FILE = ANN_DIR / f"ann_{PROCESS_ID}.jsonl"
60
+
61
+ COMMIT_INTERVAL_MIN = 5
62
+
63
+
64
+ # ---------------------------------------------------------------------------
65
+ # Load merged.json (584 task records) once at startup
66
+ # ---------------------------------------------------------------------------
67
+
68
+ def _load_merged() -> list[dict[str, Any]]:
69
+ local = hf_hub_download(
70
+ repo_id=DATASET_REPO,
71
+ filename=MERGED_JSON_PATH,
72
+ repo_type="dataset",
73
+ token=HF_TOKEN,
74
+ )
75
+ with open(local, encoding="utf-8") as f:
76
+ return json.load(f)
77
+
78
+
79
+ TASKS: list[dict[str, Any]] = _load_merged()
80
+ TASK_BY_ID: dict[str, dict[str, Any]] = {t["task_id"]: t for t in TASKS}
81
+
82
+
83
+ def _extract_prompt(task: dict[str, Any]) -> str:
84
+ """Return the first non-empty prompt string found in the task record."""
85
+ gp = task.get("generation_prompts") or {}
86
+ prompts = gp.get("prompts") or {}
87
+ for level in ("level_1", "level_2", "level_3"):
88
+ val = prompts.get(level)
89
+ if isinstance(val, list) and val:
90
+ return val[0]
91
+ if isinstance(val, str) and val:
92
+ return val
93
+ return "(no prompt found)"
94
+
95
+
96
+ # ---------------------------------------------------------------------------
97
+ # Build the (model, task_id) pool
98
+ # ---------------------------------------------------------------------------
99
+
100
+ def _build_pool() -> list[tuple[str, str]]:
101
+ pool: list[tuple[str, str]] = []
102
+ for m in MODELS:
103
+ for t in TASKS:
104
+ pool.append((m, t["task_id"]))
105
+ return pool
106
+
107
+
108
+ POOL: list[tuple[str, str]] = _build_pool()
109
+ print(f"[mbench-ann] loaded {len(TASKS)} tasks × {len(MODELS)} models = {len(POOL)} items")
110
+
111
+
112
+ def _video_url(model: str, task_id: str) -> str:
113
+ return hf_hub_url(
114
+ DATASET_REPO,
115
+ filename=f"MBench-V/{model}/videos/{task_id}.mp4",
116
+ repo_type="dataset",
117
+ )
118
+
119
+
120
+ # ---------------------------------------------------------------------------
121
+ # CommitScheduler — pushes annotations_local/ to DATASET_REPO every 5 min
122
+ # ---------------------------------------------------------------------------
123
+
124
+ scheduler: CommitScheduler | None = None
125
+ if HF_TOKEN:
126
+ scheduler = CommitScheduler(
127
+ repo_id=DATASET_REPO,
128
+ repo_type="dataset",
129
+ folder_path=str(ANN_DIR),
130
+ path_in_repo="annotations",
131
+ every=COMMIT_INTERVAL_MIN,
132
+ token=HF_TOKEN,
133
+ private=False,
134
+ squash_history=False,
135
+ )
136
+ print(f"[mbench-ann] CommitScheduler started (every {COMMIT_INTERVAL_MIN} min)")
137
+ else:
138
+ print("[mbench-ann] WARNING: HF_TOKEN not set — annotations will stay local only")
139
+
140
+
141
+ def _append_annotation(record: dict[str, Any]) -> None:
142
+ line = json.dumps(record, ensure_ascii=False)
143
+ if scheduler is not None:
144
+ with scheduler.lock:
145
+ with ANN_FILE.open("a", encoding="utf-8") as f:
146
+ f.write(line + "\n")
147
+ else:
148
+ with ANN_FILE.open("a", encoding="utf-8") as f:
149
+ f.write(line + "\n")
150
+
151
+
152
+ # ---------------------------------------------------------------------------
153
+ # UI helpers
154
+ # ---------------------------------------------------------------------------
155
+
156
+ def _format_meta(model: str, task: dict[str, Any], idx: int, total: int) -> str:
157
+ lines = [
158
+ f"**Progress**: {idx + 1} / {total}",
159
+ f"**Model**: `{model}`",
160
+ f"**task_id**: `{task['task_id']}`",
161
+ f"**category**: `{task.get('category', '?')}` • **subcategory**: `{task.get('subcategory', '?')}`",
162
+ f"**source_task**: `{task.get('source_task', '?')}`",
163
+ ]
164
+ if task.get("task_type"):
165
+ lines.append(f"**task_type**: `{task['task_type']}`")
166
+ return "\n\n".join(lines)
167
+
168
+
169
+ def _load_item(pool_order: list[int], idx: int) -> tuple[str, str, str]:
170
+ """Return (video_url, meta_markdown, prompt_text) for position `idx`."""
171
+ if idx < 0 or idx >= len(pool_order):
172
+ return "", "**All done!** No more items.", ""
173
+ model, task_id = POOL[pool_order[idx]]
174
+ task = TASK_BY_ID[task_id]
175
+ return (
176
+ _video_url(model, task_id),
177
+ _format_meta(model, task, idx, len(pool_order)),
178
+ _extract_prompt(task),
179
+ )
180
+
181
+
182
+ # ---------------------------------------------------------------------------
183
+ # Gradio callbacks
184
+ # ---------------------------------------------------------------------------
185
+
186
+ def start_session(annotator: str, state: dict | None):
187
+ annotator = (annotator or "").strip()
188
+ if not annotator:
189
+ return (
190
+ state,
191
+ gr.update(visible=True), # login panel stays
192
+ gr.update(visible=False), # annotation panel hidden
193
+ "",
194
+ "",
195
+ "",
196
+ gr.update(value="Please enter a name first."),
197
+ )
198
+ # Build this user's shuffled order
199
+ order = list(range(len(POOL)))
200
+ rng = random.Random(f"{annotator}-{int(time.time())}")
201
+ rng.shuffle(order)
202
+ state = {"annotator": annotator, "order": order, "idx": 0}
203
+ video, meta, prompt = _load_item(order, 0)
204
+ return (
205
+ state,
206
+ gr.update(visible=False),
207
+ gr.update(visible=True),
208
+ video,
209
+ meta,
210
+ prompt,
211
+ gr.update(value=f"Logged in as `{annotator}`"),
212
+ )
213
+
214
+
215
+ def _advance(state: dict, record_submitted: bool):
216
+ state["idx"] += 1
217
+ video, meta, prompt = _load_item(state["order"], state["idx"])
218
+ status = (
219
+ f"Submitted ({state['idx']} done). Next →"
220
+ if record_submitted
221
+ else f"Skipped. Next →"
222
+ )
223
+ # Reset score + note controls
224
+ return state, video, meta, prompt, 3, "", status
225
+
226
+
227
+ def submit_and_next(state: dict, score: int, note: str):
228
+ if state is None or state.get("idx") is None:
229
+ return state, "", "", "", 3, "", "Not logged in."
230
+ order = state["order"]
231
+ idx = state["idx"]
232
+ if idx >= len(order):
233
+ return state, "", "**All done!**", "", 3, "", "No more items."
234
+ model, task_id = POOL[order[idx]]
235
+ record = {
236
+ "timestamp": time.time(),
237
+ "timestamp_iso": time.strftime("%Y-%m-%dT%H:%M:%S", time.gmtime()),
238
+ "annotator": state["annotator"],
239
+ "process_id": PROCESS_ID,
240
+ "model": model,
241
+ "task_id": task_id,
242
+ "score": int(score),
243
+ "note": (note or "").strip(),
244
+ }
245
+ _append_annotation(record)
246
+ return _advance(state, record_submitted=True)
247
+
248
+
249
+ def skip_and_next(state: dict):
250
+ if state is None or state.get("idx") is None:
251
+ return state, "", "", "", 3, "", "Not logged in."
252
+ return _advance(state, record_submitted=False)
253
+
254
+
255
+ # ---------------------------------------------------------------------------
256
+ # Gradio UI
257
+ # ---------------------------------------------------------------------------
258
+
259
+ THEME = gr.themes.Soft(primary_hue="indigo")
260
+
261
+
262
+ with gr.Blocks(theme=THEME, title="MBench-V Annotation") as demo:
263
+ gr.Markdown(
264
+ """
265
+ # 🎬 MBench-V Annotation
266
+
267
+ Watch each generated video and rate it **1–5** (5 = best). Click **Submit & Next** to save.
268
+ Your submissions are auto-committed to the dataset repo every 5 minutes.
269
+ """
270
+ )
271
+
272
+ session_state = gr.State(value=None)
273
+
274
+ # ---- Login panel ----
275
+ with gr.Group(visible=True) as login_panel:
276
+ with gr.Row():
277
+ annotator_in = gr.Textbox(
278
+ label="Annotator name", placeholder="e.g. alice",
279
+ scale=4, autofocus=True,
280
+ )
281
+ login_btn = gr.Button("Start annotating", variant="primary", scale=1)
282
+
283
+ # ---- Annotation panel ----
284
+ with gr.Group(visible=False) as ann_panel:
285
+ with gr.Row():
286
+ with gr.Column(scale=3):
287
+ video = gr.Video(label="Generated video", autoplay=True, loop=True)
288
+ with gr.Column(scale=2):
289
+ meta_md = gr.Markdown()
290
+ prompt_tb = gr.Textbox(
291
+ label="Generation prompt",
292
+ lines=10, max_lines=20, interactive=False,
293
+ )
294
+ with gr.Column(scale=1):
295
+ score = gr.Slider(1, 5, value=3, step=1, label="Score (1 worst – 5 best)")
296
+ note = gr.Textbox(label="Note (optional)", lines=4)
297
+ submit_btn = gr.Button("✅ Submit & Next", variant="primary")
298
+ skip_btn = gr.Button("⏭️ Skip")
299
+
300
+ status = gr.Markdown("")
301
+
302
+ # ---- Wiring ----
303
+ login_btn.click(
304
+ start_session,
305
+ inputs=[annotator_in, session_state],
306
+ outputs=[session_state, login_panel, ann_panel, video, meta_md, prompt_tb, status],
307
+ )
308
+ annotator_in.submit(
309
+ start_session,
310
+ inputs=[annotator_in, session_state],
311
+ outputs=[session_state, login_panel, ann_panel, video, meta_md, prompt_tb, status],
312
+ )
313
+ submit_btn.click(
314
+ submit_and_next,
315
+ inputs=[session_state, score, note],
316
+ outputs=[session_state, video, meta_md, prompt_tb, score, note, status],
317
+ )
318
+ skip_btn.click(
319
+ skip_and_next,
320
+ inputs=[session_state],
321
+ outputs=[session_state, video, meta_md, prompt_tb, score, note, status],
322
+ )
323
+
324
+
325
+ if __name__ == "__main__":
326
+ demo.queue(default_concurrency_limit=8).launch()
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gradio==4.44.0
2
+ huggingface_hub>=0.24.0