Luigi commited on
Commit
e78283f
·
1 Parent(s): 6be837f

Add bilingual support: English and Traditional Chinese (zh-TW)

Browse files

Web Interface (app.py):
- Add language selector dropdown in Advanced Settings
- Update summarize_streaming() to accept output_language parameter
- Implement dynamic prompts based on selected language
- Conditionally apply OpenCC conversion only for zh-TW output
- Pass language parameter through submit button click handler

CLI (summarize_transcript.py):
- Add -l/--language argument with choices [en, zh-TW], default en
- Update stream_summarize_transcript() to handle language parameter
- Implement dynamic system/user prompts for both languages
- Conditionally apply OpenCC conversion based on language
- Update summary header to show selected language

Documentation (AGENTS.md):
- Update CLI examples to show -l zh-TW flag
- Add language parameter to usage patterns
- Document conditional OpenCC conversion
- Update default language notes to reflect English default

Changes: 3 files, +64/-23 lines

Files changed (3) hide show
  1. AGENTS.md +11 -5
  2. app.py +28 -8
  3. summarize_transcript.py +25 -10
AGENTS.md CHANGED
@@ -2,13 +2,14 @@
2
 
3
  ## Project Overview
4
 
5
- Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and Traditional Chinese (zh-TW) conversion via OpenCC.
6
 
7
  ## Build / Lint / Test Commands
8
 
9
  **Run the CLI script:**
10
  ```bash
11
- python summarize_transcript.py -i ./transcripts/short.txt
 
12
  python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
13
  python summarize_transcript.py -c # CPU only
14
  ```
@@ -162,11 +163,15 @@ for chunk in stream:
162
  buffer = buffer[:thinking_match.start()] + buffer[thinking_match.end():]
163
  ```
164
 
165
- **Chinese Text Conversion:**
166
  ```python
167
  # Convert Simplified Chinese to Traditional Chinese (Taiwan)
168
  converter = OpenCC('s2twp') # s2twp = Simplified to Traditional (Taiwan)
169
- traditional_text = converter.convert(simplified_text)
 
 
 
 
170
  ```
171
 
172
  ## Notes for AI Agents
@@ -175,7 +180,8 @@ traditional_text = converter.convert(simplified_text)
175
  - When modifying, maintain the existing streaming output pattern
176
  - Always call `llm.reset()` after completion to ensure state isolation
177
  - Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
178
- - Default language output is Traditional Chinese (zh-TW) via OpenCC conversion
 
179
  - HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
180
  - HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
181
  - Keep model sizes under 4GB for reasonable performance on free tier
 
2
 
3
  ## Project Overview
4
 
5
+ Tiny Scribe is a Python CLI tool and Gradio web app for summarizing transcripts using GGUF models (e.g., ERNIE, Qwen, Granite) with llama-cpp-python. It supports live streaming output and bilingual summaries (English or Traditional Chinese zh-TW) via OpenCC.
6
 
7
  ## Build / Lint / Test Commands
8
 
9
  **Run the CLI script:**
10
  ```bash
11
+ python summarize_transcript.py -i ./transcripts/short.txt # Default English output
12
+ python summarize_transcript.py -i ./transcripts/short.txt -l zh-TW # Traditional Chinese output
13
  python summarize_transcript.py -m unsloth/Qwen3-1.7B-GGUF:Q2_K_L
14
  python summarize_transcript.py -c # CPU only
15
  ```
 
163
  buffer = buffer[:thinking_match.start()] + buffer[thinking_match.end():]
164
  ```
165
 
166
+ **Chinese Text Conversion (zh-TW mode only):**
167
  ```python
168
  # Convert Simplified Chinese to Traditional Chinese (Taiwan)
169
  converter = OpenCC('s2twp') # s2twp = Simplified to Traditional (Taiwan)
170
+ # Only apply when output_language == "zh-TW"
171
+ if output_language == "zh-TW":
172
+ traditional_text = converter.convert(simplified_text)
173
+ else:
174
+ traditional_text = simplified_text # Skip conversion for English
175
  ```
176
 
177
  ## Notes for AI Agents
 
180
  - When modifying, maintain the existing streaming output pattern
181
  - Always call `llm.reset()` after completion to ensure state isolation
182
  - Model format: `repo_id:quant` (e.g., `unsloth/Qwen3-1.7B-GGUF:Q2_K_L`)
183
+ - Default language output is English (zh-TW available via `-l zh-TW` flag or web UI dropdown)
184
+ - OpenCC conversion only applied when output_language is "zh-TW"
185
  - HuggingFace cache at `~/.cache/huggingface/hub/` - clean periodically
186
  - HF Spaces runs on CPU tier with 2 vCPUs, 16GB RAM
187
  - Keep model sizes under 4GB for reasonable performance on free tier
app.py CHANGED
@@ -380,6 +380,7 @@ def summarize_streaming(
380
  max_tokens: int = 2048,
381
  top_p: float = None,
382
  top_k: int = None,
 
383
  ) -> Generator[Tuple[str, str, str], None, None]:
384
  """
385
  Stream summary generation from uploaded file.
@@ -391,6 +392,7 @@ def summarize_streaming(
391
  max_tokens: Maximum tokens to generate
392
  top_p: Nucleus sampling parameter (uses model default if None)
393
  top_k: Top-k sampling parameter (uses model default if None)
 
394
 
395
  Yields:
396
  Tuple of (thinking_text, summary_text, info_text)
@@ -449,15 +451,24 @@ def summarize_streaming(
449
 
450
  # Prepare system prompt with reasoning toggle for Qwen3 models
451
  model = AVAILABLE_MODELS[model_key]
452
- if model.get("supports_toggle"):
453
- reasoning_mode = "/think" if enable_reasoning else "/no_think"
454
- system_content = f"你是一個有助的助手,負責總結轉錄內容。{reasoning_mode}"
 
 
 
 
455
  else:
456
- system_content = "你是一個有助的助手,負責總結轉錄內容。"
 
 
 
 
 
457
 
458
  messages = [
459
  {"role": "system", "content": system_content},
460
- {"role": "user", "content": f"請總結以下內容:\n\n{transcript}"},
461
  ]
462
 
463
  # Get model-specific inference settings
@@ -490,8 +501,11 @@ def summarize_streaming(
490
  delta = chunk['choices'][0].get('delta', {})
491
  content = delta.get('content', '')
492
  if content:
493
- converted = converter.convert(content)
494
- full_response += converted
 
 
 
495
 
496
  thinking, summary = parse_thinking_blocks(full_response, streaming=True)
497
  current_thinking = thinking or ""
@@ -732,6 +746,12 @@ def create_interface():
732
  label="Model",
733
  info="Smaller = faster. Large files need models with bigger context."
734
  )
 
 
 
 
 
 
735
  enable_reasoning = gr.Checkbox(
736
  value=True,
737
  label="Enable Reasoning Mode",
@@ -808,7 +828,7 @@ def create_interface():
808
  # Event handlers
809
  submit_btn.click(
810
  fn=summarize_streaming,
811
- inputs=[file_input, model_dropdown, enable_reasoning, max_tokens, top_p, top_k],
812
  outputs=[thinking_output, summary_output, info_output],
813
  show_progress="full"
814
  )
 
380
  max_tokens: int = 2048,
381
  top_p: float = None,
382
  top_k: int = None,
383
+ output_language: str = "en",
384
  ) -> Generator[Tuple[str, str, str], None, None]:
385
  """
386
  Stream summary generation from uploaded file.
 
392
  max_tokens: Maximum tokens to generate
393
  top_p: Nucleus sampling parameter (uses model default if None)
394
  top_k: Top-k sampling parameter (uses model default if None)
395
+ output_language: Target language for summary ("en" or "zh-TW")
396
 
397
  Yields:
398
  Tuple of (thinking_text, summary_text, info_text)
 
451
 
452
  # Prepare system prompt with reasoning toggle for Qwen3 models
453
  model = AVAILABLE_MODELS[model_key]
454
+ if output_language == "zh-TW":
455
+ if model.get("supports_toggle"):
456
+ reasoning_mode = "/think" if enable_reasoning else "/no_think"
457
+ system_content = f"你是一個有助的助手,負責總結轉錄內容。{reasoning_mode}"
458
+ else:
459
+ system_content = "你是一個有助的助手,負責總結轉錄內容。"
460
+ user_content = f"請總結以下內容:\n\n{transcript}"
461
  else:
462
+ if model.get("supports_toggle"):
463
+ reasoning_mode = "/think" if enable_reasoning else "/no_think"
464
+ system_content = f"You are a helpful assistant that summarizes transcripts. {reasoning_mode}"
465
+ else:
466
+ system_content = "You are a helpful assistant that summarizes transcripts."
467
+ user_content = f"Please summarize the following content:\n\n{transcript}"
468
 
469
  messages = [
470
  {"role": "system", "content": system_content},
471
+ {"role": "user", "content": user_content},
472
  ]
473
 
474
  # Get model-specific inference settings
 
501
  delta = chunk['choices'][0].get('delta', {})
502
  content = delta.get('content', '')
503
  if content:
504
+ if output_language == "zh-TW":
505
+ converted = converter.convert(content)
506
+ full_response += converted
507
+ else:
508
+ full_response += content
509
 
510
  thinking, summary = parse_thinking_blocks(full_response, streaming=True)
511
  current_thinking = thinking or ""
 
746
  label="Model",
747
  info="Smaller = faster. Large files need models with bigger context."
748
  )
749
+ language_selector = gr.Dropdown(
750
+ choices=[("English", "en"), ("Traditional Chinese (zh-TW)", "zh-TW")],
751
+ value="en",
752
+ label="Output Language",
753
+ info="Select target language for the summary"
754
+ )
755
  enable_reasoning = gr.Checkbox(
756
  value=True,
757
  label="Enable Reasoning Mode",
 
828
  # Event handlers
829
  submit_btn.click(
830
  fn=summarize_streaming,
831
+ inputs=[file_input, model_dropdown, enable_reasoning, max_tokens, top_p, top_k, language_selector],
832
  outputs=[thinking_output, summary_output, info_output],
833
  show_progress="full"
834
  )
summarize_transcript.py CHANGED
@@ -63,24 +63,33 @@ def parse_thinking_blocks(content: str) -> Tuple[str, str]:
63
 
64
  return (thinking, summary)
65
 
66
- def stream_summarize_transcript(llm, transcript):
67
  """
68
  Perform live streaming summary by getting real-time token output from the model.
69
 
70
  Args:
71
  llm: The loaded language model
72
  transcript: The full transcript to summarize
 
73
  """
74
  cc = OpenCC('s2twp') # Simplified Chinese to Traditional Chinese (Taiwan standard with phrase conversion)
75
 
76
- # Use the model's chat format based on its template
 
 
 
 
 
 
 
77
  messages = [
78
- {"role": "system", "content": "你是一個有助的助手,負責總結轉錄內容。"},
79
- {"role": "user", "content": f"請總結以下內容:\n\n{transcript}"}
80
  ]
81
 
82
  # Generate the summary using streaming completion
83
- print(f"\nStreaming zh-TW summary:")
 
84
  print("="*50)
85
 
86
  full_response = ""
@@ -101,9 +110,13 @@ def stream_summarize_transcript(llm, transcript):
101
  delta = chunk['choices'][0].get('delta', {})
102
  content = delta.get('content', '')
103
  if content:
104
- converted_content = cc.convert(content)
105
- print(converted_content, end='', flush=True)
106
- full_response += converted_content
 
 
 
 
107
 
108
  print("\n" + "="*50)
109
 
@@ -122,6 +135,8 @@ def main():
122
  default="unsloth/Qwen3-0.6B-GGUF:Q4_0",
123
  help="HuggingFace model in format repo_id:quant (e.g., unsloth/Qwen3-1.7B-GGUF:Q2_K_L)")
124
  parser.add_argument("-c", "--cpu", action="store_true", help="Force CPU only inference")
 
 
125
  args = parser.parse_args()
126
 
127
  # Parse model argument if provided
@@ -148,8 +163,8 @@ def main():
148
  print("\nOriginal Transcript (Preview):")
149
  print(transcript[:500] + "..." if len(transcript) > 500 else transcript)
150
 
151
- # Summarize in Chinese (zh-TW) with streaming
152
- summary = stream_summarize_transcript(llm, transcript)
153
 
154
  # Save summaries to files
155
  # Parse thinking blocks and separate content
 
63
 
64
  return (thinking, summary)
65
 
66
+ def stream_summarize_transcript(llm, transcript, output_language="en"):
67
  """
68
  Perform live streaming summary by getting real-time token output from the model.
69
 
70
  Args:
71
  llm: The loaded language model
72
  transcript: The full transcript to summarize
73
+ output_language: Target language for summary ("en" or "zh-TW")
74
  """
75
  cc = OpenCC('s2twp') # Simplified Chinese to Traditional Chinese (Taiwan standard with phrase conversion)
76
 
77
+ # Use the model's chat format based on its template and language
78
+ if output_language == "zh-TW":
79
+ system_msg = "你是一個有助的助手,負責總結轉錄內容。"
80
+ user_msg = f"請總結以下內容:\n\n{transcript}"
81
+ else:
82
+ system_msg = "You are a helpful assistant that summarizes transcripts."
83
+ user_msg = f"Please summarize the following content:\n\n{transcript}"
84
+
85
  messages = [
86
+ {"role": "system", "content": system_msg},
87
+ {"role": "user", "content": user_msg}
88
  ]
89
 
90
  # Generate the summary using streaming completion
91
+ lang_display = "zh-TW" if output_language == "zh-TW" else "English"
92
+ print(f"\nStreaming {lang_display} summary:")
93
  print("="*50)
94
 
95
  full_response = ""
 
110
  delta = chunk['choices'][0].get('delta', {})
111
  content = delta.get('content', '')
112
  if content:
113
+ if output_language == "zh-TW":
114
+ converted_content = cc.convert(content)
115
+ print(converted_content, end='', flush=True)
116
+ full_response += converted_content
117
+ else:
118
+ print(content, end='', flush=True)
119
+ full_response += content
120
 
121
  print("\n" + "="*50)
122
 
 
135
  default="unsloth/Qwen3-0.6B-GGUF:Q4_0",
136
  help="HuggingFace model in format repo_id:quant (e.g., unsloth/Qwen3-1.7B-GGUF:Q2_K_L)")
137
  parser.add_argument("-c", "--cpu", action="store_true", help="Force CPU only inference")
138
+ parser.add_argument("-l", "--language", type=str, choices=["en", "zh-TW"], default="en",
139
+ help="Output language (default: en)")
140
  args = parser.parse_args()
141
 
142
  # Parse model argument if provided
 
163
  print("\nOriginal Transcript (Preview):")
164
  print(transcript[:500] + "..." if len(transcript) > 500 else transcript)
165
 
166
+ # Summarize with streaming
167
+ summary = stream_summarize_transcript(llm, transcript, output_language=args.language)
168
 
169
  # Save summaries to files
170
  # Parse thinking blocks and separate content