PicoAudio2

Running on Zero

App Files Files Community

rookie9 commited on Oct 10

Commit

0160e18

verified ·

1 Parent(s): 8ae83f3

Update app.py

Browse files

Files changed (1) hide show

app.py +32 -9

app.py CHANGED Viewed

@@ -99,7 +99,7 @@ demo = gr.Interface(
     fn=infer,
     inputs=[
         gr.Textbox(label="TCC (necessary)", value="a dog barks"),
-        gr.Textbox(label="TDC (optional, see format)", value="random"),
         gr.Textbox(label="Length (seconds, optional)", value="10.0"),
         gr.Checkbox(label="Enable Time Control", value=False),
     ],
@@ -108,14 +108,37 @@ demo = gr.Interface(
         gr.Textbox(label="Final TDC Used (input_onset)")
     ],
     title="PicoAudio2 Online Inference",
-    description=(
-        "TCC (temporal coarse caption) is necessary to generate audio. "
-        "If you need time control, please enter TDC and length (temporal detailed caption, in seconds). "
-        "Alternatively, you can let the LLM generate TDC, but API quota limits may affect availability. "
-        "TDC format: \"event1(start1-end1, start2-end2); event2(start1-end1, start2-end2...)\", for example: "
-        "\"a dog barks(1.0-2.0, 3.0-4.0); a man speaks(5.0-6.0)\""
-        "If the format of TDC is wrong or no input length, the model will generate audio without temporal control. Sorry!"
-    )
 )
 if __name__ == "__main__":
     demo.launch()

     fn=infer,
     inputs=[
         gr.Textbox(label="TCC (necessary)", value="a dog barks"),
+        gr.Textbox(label="TDC (optional, see format)", value="a dog barks(3.0-4.0, 6.0-7.0)"),
         gr.Textbox(label="Length (seconds, optional)", value="10.0"),
         gr.Checkbox(label="Enable Time Control", value=False),
     ],
         gr.Textbox(label="Final TDC Used (input_onset)")
     ],
     title="PicoAudio2 Online Inference",
+    description="""
+## Definition
+**TCC (Temporal Coarse Caption):**
+A brief text description for the overall audio scene.
+*Example*: `a dog barks`
+**TDC (Temporal Detailed Caption):**
+A **caption with timestamp information** for each event.
+It allows precise temporal control over when events happen in the generated audio.
+*Example*: `a dog barks(1.0-2.0, 3.0-4.0); a man speaks(5.0-6.0)`
+---
+## Input Requirements & Format
+- **TCC** is **required** for audio generation.
+- **TDC** is **optional**. If provided, it should follow the format:  event1(start1-end1, start2-end2); event2(start1-end1, ...)
+        *Example*:  a dog barks(1.0-2.0, 3.0-4.0); a man speaks(5.0-6.0)
+- **Length** (in seconds) is optional, but recommended for temporal control.
+- **Enable Time Control**: Tick to use TDC and length for precise event timing.
+---
+## Notes
+- If TDC format is incorrect or length is missing, the model will generate audio **without precise temporal control**.
+- For general audio generation, it is recommended to input '"random"' for TDC.
+- You may leave TDC blank to let the LLM generate timestamps automatically (subject to API quota).
+---
+"""
 )
 if __name__ == "__main__":
     demo.launch()