Spaces:

Luigi
/

tiny-scribe

Running

Luigi commited on 14 days ago

Commit

e78283f

1 Parent(s): 6be837f

Add bilingual support: English and Traditional Chinese (zh-TW)

Web Interface (app.py):
- Add language selector dropdown in Advanced Settings
- Update summarize_streaming() to accept output_language parameter
- Implement dynamic prompts based on selected language
- Conditionally apply OpenCC conversion only for zh-TW output
- Pass language parameter through submit button click handler

CLI (summarize_transcript.py):
- Add -l/--language argument with choices [en, zh-TW], default en
- Update stream_summarize_transcript() to handle language parameter
- Implement dynamic system/user prompts for both languages
- Conditionally apply OpenCC conversion based on language
- Update summary header to show selected language

Documentation (AGENTS.md):
- Update CLI examples to show -l zh-TW flag
- Add language parameter to usage patterns
- Document conditional OpenCC conversion
- Update default language notes to reflect English default

Changes: 3 files, +64/-23 lines

Files changed (3) hide show

AGENTS.md +11 -5
app.py +28 -8
summarize_transcript.py +25 -10

AGENTS.md CHANGED Viewed

@@ -2,13 +2,14 @@
 ## Project Overview
-Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and Traditional Chinese (zh-TW) conversion via OpenCC.
 ## Build / Lint / Test Commands
 **Run the CLI script:**
 ```bash
-python summarize_transcript.py -i ./transcripts/short.txt
 python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
 python summarize_transcript.py -c  # CPU only
 ```
@@ -162,11 +163,15 @@ for chunk in stream:
             buffer = buffer[:thinking_match.start()] + buffer[thinking_match.end():]
 ```
-**Chinese Text Conversion:**
 ```python
 # Convert Simplified Chinese to Traditional Chinese (Taiwan)
 converter = OpenCC('s2twp')  # s2twp = Simplified to Traditional (Taiwan)
-traditional_text = converter.convert(simplified_text)
 ```
 ## Notes for AI Agents
@@ -175,7 +180,8 @@ traditional_text = converter.convert(simplified_text)
 - When modifying, maintain the existing streaming output pattern
 - Always call `llm.reset()` after completion to ensure state isolation
 - Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
-- Default language output is Traditional Chinese (zh-TW) via OpenCC conversion
 - HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
 - HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
 - Keep model sizes under 4GB for reasonable performance on free tier

 ## Project Overview
+Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and bilingual summaries (English or Traditional Chinese zh-TW) via OpenCC.
 ## Build / Lint / Test Commands
 **Run the CLI script:**
 ```bash
+python summarize_transcript.py -i ./transcripts/short.txt              # Default English output
+python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW    # Traditional Chinese output
 python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
 python summarize_transcript.py -c  # CPU only
 ```
             buffer = buffer[:thinking_match.start()] + buffer[thinking_match.end():]
 ```
+**Chinese Text Conversion (zh-TW mode only):**
 ```python
 # Convert Simplified Chinese to Traditional Chinese (Taiwan)
 converter = OpenCC('s2twp')  # s2twp = Simplified to Traditional (Taiwan)
+# Only apply when output_language == "zh-TW"
+if output_language == "zh-TW":
+    traditional_text = converter.convert(simplified_text)
+else:
+    traditional_text = simplified_text  # Skip conversion for English
 ```
 ## Notes for AI Agents
 - When modifying, maintain the existing streaming output pattern
 - Always call `llm.reset()` after completion to ensure state isolation
 - Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
+- Default language output is English (zh-TW available via `-l zh-TW` flag or web UI dropdown)
+- OpenCC conversion only applied when output_language is "zh-TW"
 - HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
 - HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
 - Keep model sizes under 4GB for reasonable performance on free tier

app.py CHANGED Viewed

@@ -380,6 +380,7 @@ def summarize_streaming(
     max_tokens: int = 2048,
     top_p: float = None,
     top_k: int = None,
 ) -> Generator[Tuple[str, str, str], None, None]:
     """
     Stream summary generation from uploaded file.
@@ -391,6 +392,7 @@ def summarize_streaming(
         max_tokens: Maximum tokens to generate
         top_p: Nucleus sampling parameter (uses model default if None)
         top_k: Top-k sampling parameter (uses model default if None)
     Yields:
         Tuple of (thinking_text, summary_text, info_text)
@@ -449,15 +451,24 @@ def summarize_streaming(
     # Prepare system prompt with reasoning toggle for Qwen3 models
     model = AVAILABLE_MODELS[model_key]
-    if model.get("supports_toggle"):
-        reasoning_mode = "/think" if enable_reasoning else "/no_think"
-        system_content = f"你是一個有助的助手，負責總結轉錄內容。{reasoning_mode}"
     else:
-        system_content = "你是一個有助的助手，負責總結轉錄內容。"
     messages = [
         {"role": "system", "content": system_content},
-        {"role": "user", "content": f"請總結以下內容：\n\n{transcript}"},
     ]
     # Get model-specific inference settings
@@ -490,8 +501,11 @@ def summarize_streaming(
                 delta = chunk['choices'][0].get('delta', {})
                 content = delta.get('content', '')
                 if content:
-                    converted = converter.convert(content)
-                    full_response += converted
                     thinking, summary = parse_thinking_blocks(full_response, streaming=True)
                     current_thinking = thinking or ""
@@ -732,6 +746,12 @@ def create_interface():
                                 label="Model",
                                 info="Smaller = faster. Large files need models with bigger context."
                             )
                             enable_reasoning = gr.Checkbox(
                                 value=True,
                                 label="Enable Reasoning Mode",
@@ -808,7 +828,7 @@ def create_interface():
         # Event handlers
         submit_btn.click(
             fn=summarize_streaming,
-            inputs=[file_input, model_dropdown, enable_reasoning, max_tokens, top_p, top_k],
             outputs=[thinking_output, summary_output, info_output],
             show_progress="full"
         )

     max_tokens: int = 2048,
     top_p: float = None,
     top_k: int = None,
+    output_language: str = "en",
 ) -> Generator[Tuple[str, str, str], None, None]:
     """
     Stream summary generation from uploaded file.
         max_tokens: Maximum tokens to generate
         top_p: Nucleus sampling parameter (uses model default if None)
         top_k: Top-k sampling parameter (uses model default if None)
+        output_language: Target language for summary ("en" or "zh-TW")
     Yields:
         Tuple of (thinking_text, summary_text, info_text)
     # Prepare system prompt with reasoning toggle for Qwen3 models
     model = AVAILABLE_MODELS[model_key]
+    if output_language == "zh-TW":
+        if model.get("supports_toggle"):
+            reasoning_mode = "/think" if enable_reasoning else "/no_think"
+            system_content = f"你是一個有助的助手，負責總結轉錄內容。{reasoning_mode}"
+        else:
+            system_content = "你是一個有助的助手，負責總結轉錄內容。"
+        user_content = f"請總結以下內容：\n\n{transcript}"
     else:
+        if model.get("supports_toggle"):
+            reasoning_mode = "/think" if enable_reasoning else "/no_think"
+            system_content = f"You are a helpful assistant that summarizes transcripts. {reasoning_mode}"
+        else:
+            system_content = "You are a helpful assistant that summarizes transcripts."
+        user_content = f"Please summarize the following content:\n\n{transcript}"
     messages = [
         {"role": "system", "content": system_content},
+        {"role": "user", "content": user_content},
     ]
     # Get model-specific inference settings
                 delta = chunk['choices'][0].get('delta', {})
                 content = delta.get('content', '')
                 if content:
+                    if output_language == "zh-TW":
+                        converted = converter.convert(content)
+                        full_response += converted
+                    else:
+                        full_response += content
                     thinking, summary = parse_thinking_blocks(full_response, streaming=True)
                     current_thinking = thinking or ""
                                 label="Model",
                                 info="Smaller = faster. Large files need models with bigger context."
                             )
+                            language_selector = gr.Dropdown(
+                                choices=[("English", "en"), ("Traditional Chinese (zh-TW)", "zh-TW")],
+                                value="en",
+                                label="Output Language",
+                                info="Select target language for the summary"
+                            )
                             enable_reasoning = gr.Checkbox(
                                 value=True,
                                 label="Enable Reasoning Mode",
         # Event handlers
         submit_btn.click(
             fn=summarize_streaming,
+            inputs=[file_input, model_dropdown, enable_reasoning, max_tokens, top_p, top_k, language_selector],
             outputs=[thinking_output, summary_output, info_output],
             show_progress="full"
         )

summarize_transcript.py CHANGED Viewed

@@ -63,24 +63,33 @@ def parse_thinking_blocks(content: str) -> Tuple[str, str]:
     return (thinking, summary)
-def stream_summarize_transcript(llm, transcript):
     """
     Perform live streaming summary by getting real-time token output from the model.
     Args:
         llm: The loaded language model
         transcript: The full transcript to summarize
     """
     cc = OpenCC('s2twp')  # Simplified Chinese to Traditional Chinese (Taiwan standard with phrase conversion)
-    # Use the model's chat format based on its template
     messages = [
-        {"role": "system", "content": "你是一個有助的助手，負責總結轉錄內容。"},
-        {"role": "user", "content": f"請總結以下內容：\n\n{transcript}"}
     ]
     # Generate the summary using streaming completion
-    print(f"\nStreaming zh-TW summary:")
     print("="*50)
     full_response = ""
@@ -101,9 +110,13 @@ def stream_summarize_transcript(llm, transcript):
             delta = chunk['choices'][0].get('delta', {})
             content = delta.get('content', '')
             if content:
-                converted_content = cc.convert(content)
-                print(converted_content, end='', flush=True)
-                full_response += converted_content
     print("\n" + "="*50)
@@ -122,6 +135,8 @@ def main():
                         default="unsloth/Qwen3-0.6B-GGUF:Q4_0",
                         help="HuggingFace model in format repo_id:quant (e.g., unsloth/Qwen3-1.7B-GGUF:Q2_K_L)")
     parser.add_argument("-c", "--cpu", action="store_true", help="Force CPU only inference")
     args = parser.parse_args()
     # Parse model argument if provided
@@ -148,8 +163,8 @@ def main():
     print("\nOriginal Transcript (Preview):")
     print(transcript[:500] + "..." if len(transcript) > 500 else transcript)
-    # Summarize in Chinese (zh-TW) with streaming
-    summary = stream_summarize_transcript(llm, transcript)
     # Save summaries to files
     # Parse thinking blocks and separate content

     return (thinking, summary)
+def stream_summarize_transcript(llm, transcript, output_language="en"):
     """
     Perform live streaming summary by getting real-time token output from the model.
     Args:
         llm: The loaded language model
         transcript: The full transcript to summarize
+        output_language: Target language for summary ("en" or "zh-TW")
     """
     cc = OpenCC('s2twp')  # Simplified Chinese to Traditional Chinese (Taiwan standard with phrase conversion)
+    # Use the model's chat format based on its template and language
+    if output_language == "zh-TW":
+        system_msg = "你是一個有助的助手，負責總結轉錄內容。"
+        user_msg = f"請總結以下內容：\n\n{transcript}"
+    else:
+        system_msg = "You are a helpful assistant that summarizes transcripts."
+        user_msg = f"Please summarize the following content:\n\n{transcript}"
     messages = [
+        {"role": "system", "content": system_msg},
+        {"role": "user", "content": user_msg}
     ]
     # Generate the summary using streaming completion
+    lang_display = "zh-TW" if output_language == "zh-TW" else "English"
+    print(f"\nStreaming {lang_display} summary:")
     print("="*50)
     full_response = ""
             delta = chunk['choices'][0].get('delta', {})
             content = delta.get('content', '')
             if content:
+                if output_language == "zh-TW":
+                    converted_content = cc.convert(content)
+                    print(converted_content, end='', flush=True)
+                    full_response += converted_content
+                else:
+                    print(content, end='', flush=True)
+                    full_response += content
     print("\n" + "="*50)
                         default="unsloth/Qwen3-0.6B-GGUF:Q4_0",
                         help="HuggingFace model in format repo_id:quant (e.g., unsloth/Qwen3-1.7B-GGUF:Q2_K_L)")
     parser.add_argument("-c", "--cpu", action="store_true", help="Force CPU only inference")
+    parser.add_argument("-l", "--language", type=str, choices=["en", "zh-TW"], default="en",
+                        help="Output language (default: en)")
     args = parser.parse_args()
     # Parse model argument if provided
     print("\nOriginal Transcript (Preview):")
     print(transcript[:500] + "..." if len(transcript) > 500 else transcript)
+    # Summarize with streaming
+    summary = stream_summarize_transcript(llm, transcript, output_language=args.language)
     # Save summaries to files
     # Parse thinking blocks and separate content