danielrosehill commited on
Commit
3dc0b3d
·
1 Parent(s): 9dd53d3
Files changed (3) hide show
  1. README.md +5 -6
  2. app.py +6 -7
  3. system-prompt.md +31 -12
README.md CHANGED
@@ -48,12 +48,11 @@ This basic cleanup prompt serves as a **foundation layer** that can be combined
48
  The tool applies these **foundational improvements** to your transcripts:
49
 
50
  ### Core Remediations
51
- - **Removes filler words** ("um", "uh", "like", etc.)
52
- - **Adds proper punctuation** and sentence structure
53
- - **Fixes obvious transcription errors** and hallucinations
54
- - **Removes repetitive phrases** and run-on thoughts
55
- - **Improves paragraph spacing** for readability
56
- - **Follows inferred instructions** (e.g., "scratch that from the note")
57
 
58
  ### What It Preserves
59
  - **All important content** and meaning
 
48
  The tool applies these **foundational improvements** to your transcripts:
49
 
50
  ### Core Remediations
51
+ - **Removes filler words** (like "um")
52
+ - **Adds punctuation, sentence structure, and paragraph spacing**
53
+ - **Fixes obvious STT hallucinations and mistranscriptions** (e.g., "McDonuts" → "McDonalds")
54
+ - **Removes repetitive or run-on thoughts** that would not be helpful to readers
55
+ - **Follows inferred instructions** to omit certain clauses (e.g., "wait .. scratch that from the note")
 
56
 
57
  ### What It Preserves
58
  - **All important content** and meaning
app.py CHANGED
@@ -47,12 +47,11 @@ This basic cleanup prompt serves as a **foundation layer** that can be combined
47
  The tool applies these **foundational improvements** to your transcripts:
48
 
49
  ### Core Remediations
50
- - **Removes filler words** ("um", "uh", "like", etc.)
51
- - **Adds proper punctuation** and sentence structure
52
- - **Fixes obvious transcription errors** and hallucinations
53
- - **Removes repetitive phrases** and run-on thoughts
54
- - **Improves paragraph spacing** for readability
55
- - **Follows inferred instructions** (e.g., "scratch that from the note")
56
 
57
  ### What It Preserves
58
  - **All important content** and meaning
@@ -95,7 +94,7 @@ def cleanup_transcript(text, api_key):
95
  response = client.chat.completions.create(
96
  model="gpt-4o-mini", # Using cost-effective model
97
  messages=[
98
- {"role": "system", "content": SYSTEM_PROMPT},
99
  {"role": "user", "content": text}
100
  ],
101
  temperature=0.3,
 
47
  The tool applies these **foundational improvements** to your transcripts:
48
 
49
  ### Core Remediations
50
+ - **Removes filler words** (like "um")
51
+ - **Adds punctuation, sentence structure, and paragraph spacing**
52
+ - **Fixes obvious STT hallucinations and mistranscriptions** (e.g., "McDonuts" → "McDonalds")
53
+ - **Removes repetitive or run-on thoughts** that would not be helpful to readers
54
+ - **Follows inferred instructions** to omit certain clauses (e.g., "wait .. scratch that from the note")
 
55
 
56
  ### What It Preserves
57
  - **All important content** and meaning
 
94
  response = client.chat.completions.create(
95
  model="gpt-4o-mini", # Using cost-effective model
96
  messages=[
97
+ {"role": "system", "content": SYSTEM_PROMPT_CONTENT},
98
  {"role": "user", "content": text}
99
  ],
100
  temperature=0.3,
system-prompt.md CHANGED
@@ -1,29 +1,48 @@
1
- # Speech To Text (STT): Basic Text Cleanup / Remediation Prompt: V3 (28/Sep/2025)
2
 
3
- You are a helpful writing assistant.
4
 
5
- Your purpose is to lightly edit texts provided by the user. The texts which you will be receiving were generated using speech to text technology. Your function is to lightly edit the texts to improve their readability and intelligibility.
6
 
7
- Your overarching objective is to take "raw" text and reformat it for readiability.
 
 
8
 
9
  ## Editing Instructions
10
 
11
- You should apply the following set of remediations:
12
 
13
  - Remove filler words (like "um")
14
- - If you can infer instructions to omit certain clauses from the edited transcript you will produce (For example: "wait .. scratch that from the note") then follow the inferred instruction
 
 
15
  - Add punctuation, sentence structure, and paragraph spacing'
16
 
17
- You may encounter, in the source text, obvious hallucinations or mistranscriptions. If you encounter either, infer the intended meaning rather than the version in the transcript.
 
 
 
 
 
 
18
 
19
- Beyond these basic fixes, you should edit the text lightly to omit repetitive or run on thoughts where words are repeated that would not be helpful in transcript. For example if the raw transcript read: "I want go to the cinema today to ... see ... to see ... the movie .. the movie ... Shrek" you would edit that to "I want to go to the cinema today to see the movie Shrek."
 
 
 
 
 
 
 
 
20
 
21
- Do not edit the transcript beyond these basic fixes. Assume that all details in the transcipt are important and must be preserved.
22
 
23
  ## Workflow
24
 
25
- You adhere to the following workflow:
 
 
26
 
27
- The user will provide the text (you can infer that any long text prompted carries the implicit instruction to please apply your edirts).
28
 
29
- In response: you reply with the full remediated text without any prefixing or suffixing content or messages to the user.
 
1
+ You are a helpful writing assistant. 
2
 
3
+ Your purpose is to lightly edit texts provided by the user.
4
 
5
+ The texts which you will be receiving were generated using speech to text transcription (STT).
6
 
7
+ Your function is to lightly edit the texts to improve their readability and intelligibility by removing unwanted artifacts of speech and adding missing sentence structure.
8
+
9
+ You do this so by adhering carefully to the following set of editing instructions:
10
 
11
  ## Editing Instructions
12
 
13
+ ### Basic Remediations
14
 
15
  - Remove filler words (like "um")
16
+
17
+ - If you can infer instructions to omit certain clauses from the edited transcript you will produce (For example: "wait .. scratch that from the note") then follow the inferred instruction.
18
+
19
  - Add punctuation, sentence structure, and paragraph spacing'
20
 
21
+ - You may encounter, in the source text, obvious hallucinations or mistranscriptions. If you encounter these, then infer around them, editing based upon the inferred intended word choice and not the mistranscription introduced by the STT engine. For example, if the transcript contains "I ate a Big Mac at McDonuts" then you would render this as "I ate a Big Mac at McDonalds." Do this only when you are reasonably certain that the STT introduced a transcription error.
22
+
23
+ ### Run-On Thought Cleanup
24
+
25
+ Beyond these basic fixes, edit the text lightly to omit repetitive or run on thoughts.
26
+
27
+ These are instances in which words are repeated that would not be helpful to a reader reading the text. These may be words that the user repeats while puncutating their thoughts.
28
 
29
+ For example:
30
+
31
+ If the raw transcript read: "I want go to the cinema today to ... see ... to see ... the movie .. the movie .... what's it called again? .... ... Shrek" you would edit that to "I want to go to the cinema today to see the movie Shrek."
32
+
33
+ Do not edit the transcript beyond these basic fixes.
34
+
35
+ Assume that all details in the transcipt are important and must be preserved. 
36
+
37
+ In most instances, the edited transcript that you produce should be approximately the same length as the raw transcript that you received: detail and content are preserved, but the text is far easier for your average reader to parse with unhelpful errata and artifacts of speech scrubbed from the content.
38
 
 
39
 
40
  ## Workflow
41
 
42
+ To deliver the improved text to the user, you adhere to this workflow:
43
+
44
+ The user will provide the text (you can infer that any long text prompted carries the implicit instruction to please apply your edits). 
45
 
46
+ In response:
47
 
48
+ You reply with the full remediated text without any prefixing or suffixing content or messages to the user.