Spaces:
Running
Add bilingual support: English and Traditional Chinese (zh-TW)
Browse filesWeb Interface (app.py):
- Add language selector dropdown in Advanced Settings
- Update summarize_streaming() to accept output_language parameter
- Implement dynamic prompts based on selected language
- Conditionally apply OpenCC conversion only for zh-TW output
- Pass language parameter through submit button click handler
CLI (summarize_transcript.py):
- Add -l/--language argument with choices [en, zh-TW], default en
- Update stream_summarize_transcript() to handle language parameter
- Implement dynamic system/user prompts for both languages
- Conditionally apply OpenCC conversion based on language
- Update summary header to show selected language
Documentation (AGENTS.md):
- Update CLI examples to show -l zh-TW flag
- Add language parameter to usage patterns
- Document conditional OpenCC conversion
- Update default language notes to reflect English default
Changes: 3 files, +64/-23 lines
- AGENTS.md +11 -5
- app.py +28 -8
- summarize_transcript.py +25 -10
|
@@ -2,13 +2,14 @@
|
|
| 2 |
|
| 3 |
## Project Overview
|
| 4 |
|
| 5 |
-
Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and Traditional Chinese
|
| 6 |
|
| 7 |
## Build / Lint / Test Commands
|
| 8 |
|
| 9 |
**Run the CLI script:**
|
| 10 |
```bash
|
| 11 |
-
python summarize_transcript.py -i ./transcripts/short.txt
|
|
|
|
| 12 |
python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
|
| 13 |
python summarize_transcript.py -c # CPU only
|
| 14 |
```
|
|
@@ -162,11 +163,15 @@ for chunk in stream:
|
|
| 162 |
buffer = buffer[:thinking_match.start()] + buffer[thinking_match.end():]
|
| 163 |
```
|
| 164 |
|
| 165 |
-
**Chinese Text Conversion:**
|
| 166 |
```python
|
| 167 |
# Convert Simplified Chinese to Traditional Chinese (Taiwan)
|
| 168 |
converter = OpenCC('s2twp') # s2twp = Simplified to Traditional (Taiwan)
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
```
|
| 171 |
|
| 172 |
## Notes for AI Agents
|
|
@@ -175,7 +180,8 @@ traditional_text = converter.convert(simplified_text)
|
|
| 175 |
- When modifying, maintain the existing streaming output pattern
|
| 176 |
- Always call `llm.reset()` after completion to ensure state isolation
|
| 177 |
- Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
|
| 178 |
-
- Default language output is
|
|
|
|
| 179 |
- HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
|
| 180 |
- HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
|
| 181 |
- Keep model sizes under 4GB for reasonable performance on free tier
|
|
|
|
| 2 |
|
| 3 |
## Project Overview
|
| 4 |
|
| 5 |
+
Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and bilingual summaries (English or Traditional Chinese zh-TW) via OpenCC.
|
| 6 |
|
| 7 |
## Build / Lint / Test Commands
|
| 8 |
|
| 9 |
**Run the CLI script:**
|
| 10 |
```bash
|
| 11 |
+
python summarize_transcript.py -i ./transcripts/short.txt # Default English output
|
| 12 |
+
python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW # Traditional Chinese output
|
| 13 |
python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
|
| 14 |
python summarize_transcript.py -c # CPU only
|
| 15 |
```
|
|
|
|
| 163 |
buffer = buffer[:thinking_match.start()] + buffer[thinking_match.end():]
|
| 164 |
```
|
| 165 |
|
| 166 |
+
**Chinese Text Conversion (zh-TW mode only):**
|
| 167 |
```python
|
| 168 |
# Convert Simplified Chinese to Traditional Chinese (Taiwan)
|
| 169 |
converter = OpenCC('s2twp') # s2twp = Simplified to Traditional (Taiwan)
|
| 170 |
+
# Only apply when output_language == "zh-TW"
|
| 171 |
+
if output_language == "zh-TW":
|
| 172 |
+
traditional_text = converter.convert(simplified_text)
|
| 173 |
+
else:
|
| 174 |
+
traditional_text = simplified_text # Skip conversion for English
|
| 175 |
```
|
| 176 |
|
| 177 |
## Notes for AI Agents
|
|
|
|
| 180 |
- When modifying, maintain the existing streaming output pattern
|
| 181 |
- Always call `llm.reset()` after completion to ensure state isolation
|
| 182 |
- Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
|
| 183 |
+
- Default language output is English (zh-TW available via `-l zh-TW` flag or web UI dropdown)
|
| 184 |
+
- OpenCC conversion only applied when output_language is "zh-TW"
|
| 185 |
- HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
|
| 186 |
- HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
|
| 187 |
- Keep model sizes under 4GB for reasonable performance on free tier
|
|
@@ -380,6 +380,7 @@ def summarize_streaming(
|
|
| 380 |
max_tokens: int = 2048,
|
| 381 |
top_p: float = None,
|
| 382 |
top_k: int = None,
|
|
|
|
| 383 |
) -> Generator[Tuple[str, str, str], None, None]:
|
| 384 |
"""
|
| 385 |
Stream summary generation from uploaded file.
|
|
@@ -391,6 +392,7 @@ def summarize_streaming(
|
|
| 391 |
max_tokens: Maximum tokens to generate
|
| 392 |
top_p: Nucleus sampling parameter (uses model default if None)
|
| 393 |
top_k: Top-k sampling parameter (uses model default if None)
|
|
|
|
| 394 |
|
| 395 |
Yields:
|
| 396 |
Tuple of (thinking_text, summary_text, info_text)
|
|
@@ -449,15 +451,24 @@ def summarize_streaming(
|
|
| 449 |
|
| 450 |
# Prepare system prompt with reasoning toggle for Qwen3 models
|
| 451 |
model = AVAILABLE_MODELS[model_key]
|
| 452 |
-
if
|
| 453 |
-
|
| 454 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 455 |
else:
|
| 456 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 457 |
|
| 458 |
messages = [
|
| 459 |
{"role": "system", "content": system_content},
|
| 460 |
-
{"role": "user", "content":
|
| 461 |
]
|
| 462 |
|
| 463 |
# Get model-specific inference settings
|
|
@@ -490,8 +501,11 @@ def summarize_streaming(
|
|
| 490 |
delta = chunk['choices'][0].get('delta', {})
|
| 491 |
content = delta.get('content', '')
|
| 492 |
if content:
|
| 493 |
-
|
| 494 |
-
|
|
|
|
|
|
|
|
|
|
| 495 |
|
| 496 |
thinking, summary = parse_thinking_blocks(full_response, streaming=True)
|
| 497 |
current_thinking = thinking or ""
|
|
@@ -732,6 +746,12 @@ def create_interface():
|
|
| 732 |
label="Model",
|
| 733 |
info="Smaller = faster. Large files need models with bigger context."
|
| 734 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 735 |
enable_reasoning = gr.Checkbox(
|
| 736 |
value=True,
|
| 737 |
label="Enable Reasoning Mode",
|
|
@@ -808,7 +828,7 @@ def create_interface():
|
|
| 808 |
# Event handlers
|
| 809 |
submit_btn.click(
|
| 810 |
fn=summarize_streaming,
|
| 811 |
-
inputs=[file_input, model_dropdown, enable_reasoning, max_tokens, top_p, top_k],
|
| 812 |
outputs=[thinking_output, summary_output, info_output],
|
| 813 |
show_progress="full"
|
| 814 |
)
|
|
|
|
| 380 |
max_tokens: int = 2048,
|
| 381 |
top_p: float = None,
|
| 382 |
top_k: int = None,
|
| 383 |
+
output_language: str = "en",
|
| 384 |
) -> Generator[Tuple[str, str, str], None, None]:
|
| 385 |
"""
|
| 386 |
Stream summary generation from uploaded file.
|
|
|
|
| 392 |
max_tokens: Maximum tokens to generate
|
| 393 |
top_p: Nucleus sampling parameter (uses model default if None)
|
| 394 |
top_k: Top-k sampling parameter (uses model default if None)
|
| 395 |
+
output_language: Target language for summary ("en" or "zh-TW")
|
| 396 |
|
| 397 |
Yields:
|
| 398 |
Tuple of (thinking_text, summary_text, info_text)
|
|
|
|
| 451 |
|
| 452 |
# Prepare system prompt with reasoning toggle for Qwen3 models
|
| 453 |
model = AVAILABLE_MODELS[model_key]
|
| 454 |
+
if output_language == "zh-TW":
|
| 455 |
+
if model.get("supports_toggle"):
|
| 456 |
+
reasoning_mode = "/think" if enable_reasoning else "/no_think"
|
| 457 |
+
system_content = f"你是一個有助的助手,負責總結轉錄內容。{reasoning_mode}"
|
| 458 |
+
else:
|
| 459 |
+
system_content = "你是一個有助的助手,負責總結轉錄內容。"
|
| 460 |
+
user_content = f"請總結以下內容:\n\n{transcript}"
|
| 461 |
else:
|
| 462 |
+
if model.get("supports_toggle"):
|
| 463 |
+
reasoning_mode = "/think" if enable_reasoning else "/no_think"
|
| 464 |
+
system_content = f"You are a helpful assistant that summarizes transcripts. {reasoning_mode}"
|
| 465 |
+
else:
|
| 466 |
+
system_content = "You are a helpful assistant that summarizes transcripts."
|
| 467 |
+
user_content = f"Please summarize the following content:\n\n{transcript}"
|
| 468 |
|
| 469 |
messages = [
|
| 470 |
{"role": "system", "content": system_content},
|
| 471 |
+
{"role": "user", "content": user_content},
|
| 472 |
]
|
| 473 |
|
| 474 |
# Get model-specific inference settings
|
|
|
|
| 501 |
delta = chunk['choices'][0].get('delta', {})
|
| 502 |
content = delta.get('content', '')
|
| 503 |
if content:
|
| 504 |
+
if output_language == "zh-TW":
|
| 505 |
+
converted = converter.convert(content)
|
| 506 |
+
full_response += converted
|
| 507 |
+
else:
|
| 508 |
+
full_response += content
|
| 509 |
|
| 510 |
thinking, summary = parse_thinking_blocks(full_response, streaming=True)
|
| 511 |
current_thinking = thinking or ""
|
|
|
|
| 746 |
label="Model",
|
| 747 |
info="Smaller = faster. Large files need models with bigger context."
|
| 748 |
)
|
| 749 |
+
language_selector = gr.Dropdown(
|
| 750 |
+
choices=[("English", "en"), ("Traditional Chinese (zh-TW)", "zh-TW")],
|
| 751 |
+
value="en",
|
| 752 |
+
label="Output Language",
|
| 753 |
+
info="Select target language for the summary"
|
| 754 |
+
)
|
| 755 |
enable_reasoning = gr.Checkbox(
|
| 756 |
value=True,
|
| 757 |
label="Enable Reasoning Mode",
|
|
|
|
| 828 |
# Event handlers
|
| 829 |
submit_btn.click(
|
| 830 |
fn=summarize_streaming,
|
| 831 |
+
inputs=[file_input, model_dropdown, enable_reasoning, max_tokens, top_p, top_k, language_selector],
|
| 832 |
outputs=[thinking_output, summary_output, info_output],
|
| 833 |
show_progress="full"
|
| 834 |
)
|
|
@@ -63,24 +63,33 @@ def parse_thinking_blocks(content: str) -> Tuple[str, str]:
|
|
| 63 |
|
| 64 |
return (thinking, summary)
|
| 65 |
|
| 66 |
-
def stream_summarize_transcript(llm, transcript):
|
| 67 |
"""
|
| 68 |
Perform live streaming summary by getting real-time token output from the model.
|
| 69 |
|
| 70 |
Args:
|
| 71 |
llm: The loaded language model
|
| 72 |
transcript: The full transcript to summarize
|
|
|
|
| 73 |
"""
|
| 74 |
cc = OpenCC('s2twp') # Simplified Chinese to Traditional Chinese (Taiwan standard with phrase conversion)
|
| 75 |
|
| 76 |
-
# Use the model's chat format based on its template
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
messages = [
|
| 78 |
-
{"role": "system", "content":
|
| 79 |
-
{"role": "user", "content":
|
| 80 |
]
|
| 81 |
|
| 82 |
# Generate the summary using streaming completion
|
| 83 |
-
|
|
|
|
| 84 |
print("="*50)
|
| 85 |
|
| 86 |
full_response = ""
|
|
@@ -101,9 +110,13 @@ def stream_summarize_transcript(llm, transcript):
|
|
| 101 |
delta = chunk['choices'][0].get('delta', {})
|
| 102 |
content = delta.get('content', '')
|
| 103 |
if content:
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
print("\n" + "="*50)
|
| 109 |
|
|
@@ -122,6 +135,8 @@ def main():
|
|
| 122 |
default="unsloth/Qwen3-0.6B-GGUF:Q4_0",
|
| 123 |
help="HuggingFace model in format repo_id:quant (e.g., unsloth/Qwen3-1.7B-GGUF:Q2_K_L)")
|
| 124 |
parser.add_argument("-c", "--cpu", action="store_true", help="Force CPU only inference")
|
|
|
|
|
|
|
| 125 |
args = parser.parse_args()
|
| 126 |
|
| 127 |
# Parse model argument if provided
|
|
@@ -148,8 +163,8 @@ def main():
|
|
| 148 |
print("\nOriginal Transcript (Preview):")
|
| 149 |
print(transcript[:500] + "..." if len(transcript) > 500 else transcript)
|
| 150 |
|
| 151 |
-
# Summarize
|
| 152 |
-
summary = stream_summarize_transcript(llm, transcript)
|
| 153 |
|
| 154 |
# Save summaries to files
|
| 155 |
# Parse thinking blocks and separate content
|
|
|
|
| 63 |
|
| 64 |
return (thinking, summary)
|
| 65 |
|
| 66 |
+
def stream_summarize_transcript(llm, transcript, output_language="en"):
|
| 67 |
"""
|
| 68 |
Perform live streaming summary by getting real-time token output from the model.
|
| 69 |
|
| 70 |
Args:
|
| 71 |
llm: The loaded language model
|
| 72 |
transcript: The full transcript to summarize
|
| 73 |
+
output_language: Target language for summary ("en" or "zh-TW")
|
| 74 |
"""
|
| 75 |
cc = OpenCC('s2twp') # Simplified Chinese to Traditional Chinese (Taiwan standard with phrase conversion)
|
| 76 |
|
| 77 |
+
# Use the model's chat format based on its template and language
|
| 78 |
+
if output_language == "zh-TW":
|
| 79 |
+
system_msg = "你是一個有助的助手,負責總結轉錄內容。"
|
| 80 |
+
user_msg = f"請總結以下內容:\n\n{transcript}"
|
| 81 |
+
else:
|
| 82 |
+
system_msg = "You are a helpful assistant that summarizes transcripts."
|
| 83 |
+
user_msg = f"Please summarize the following content:\n\n{transcript}"
|
| 84 |
+
|
| 85 |
messages = [
|
| 86 |
+
{"role": "system", "content": system_msg},
|
| 87 |
+
{"role": "user", "content": user_msg}
|
| 88 |
]
|
| 89 |
|
| 90 |
# Generate the summary using streaming completion
|
| 91 |
+
lang_display = "zh-TW" if output_language == "zh-TW" else "English"
|
| 92 |
+
print(f"\nStreaming {lang_display} summary:")
|
| 93 |
print("="*50)
|
| 94 |
|
| 95 |
full_response = ""
|
|
|
|
| 110 |
delta = chunk['choices'][0].get('delta', {})
|
| 111 |
content = delta.get('content', '')
|
| 112 |
if content:
|
| 113 |
+
if output_language == "zh-TW":
|
| 114 |
+
converted_content = cc.convert(content)
|
| 115 |
+
print(converted_content, end='', flush=True)
|
| 116 |
+
full_response += converted_content
|
| 117 |
+
else:
|
| 118 |
+
print(content, end='', flush=True)
|
| 119 |
+
full_response += content
|
| 120 |
|
| 121 |
print("\n" + "="*50)
|
| 122 |
|
|
|
|
| 135 |
default="unsloth/Qwen3-0.6B-GGUF:Q4_0",
|
| 136 |
help="HuggingFace model in format repo_id:quant (e.g., unsloth/Qwen3-1.7B-GGUF:Q2_K_L)")
|
| 137 |
parser.add_argument("-c", "--cpu", action="store_true", help="Force CPU only inference")
|
| 138 |
+
parser.add_argument("-l", "--language", type=str, choices=["en", "zh-TW"], default="en",
|
| 139 |
+
help="Output language (default: en)")
|
| 140 |
args = parser.parse_args()
|
| 141 |
|
| 142 |
# Parse model argument if provided
|
|
|
|
| 163 |
print("\nOriginal Transcript (Preview):")
|
| 164 |
print(transcript[:500] + "..." if len(transcript) > 500 else transcript)
|
| 165 |
|
| 166 |
+
# Summarize with streaming
|
| 167 |
+
summary = stream_summarize_transcript(llm, transcript, output_language=args.language)
|
| 168 |
|
| 169 |
# Save summaries to files
|
| 170 |
# Parse thinking blocks and separate content
|