Spaces:

danielrosehill
/

Basic-STT-Transcript-Cleanup

Sleeping

App Files Files Community

danielrosehill commited on Sep 28

Commit

3dc0b3d

1 Parent(s): 9dd53d3

commit

Browse files

Files changed (3) hide show

README.md +5 -6
app.py +6 -7
system-prompt.md +31 -12

README.md CHANGED Viewed

@@ -48,12 +48,11 @@ This basic cleanup prompt serves as a **foundation layer** that can be combined
 The tool applies these **foundational improvements** to your transcripts:
 ### Core Remediations
-- **Removes filler words** ("um", "uh", "like", etc.)
-- **Adds proper punctuation** and sentence structure
-- **Fixes obvious transcription errors** and hallucinations
-- **Removes repetitive phrases** and run-on thoughts
-- **Improves paragraph spacing** for readability
-- **Follows inferred instructions** (e.g., "scratch that from the note")
 ### What It Preserves
 - **All important content** and meaning

 The tool applies these **foundational improvements** to your transcripts:
 ### Core Remediations
+- **Removes filler words** (like "um")
+- **Adds punctuation, sentence structure, and paragraph spacing**
+- **Fixes obvious STT hallucinations and mistranscriptions** (e.g., "McDonuts" → "McDonalds")
+- **Removes repetitive or run-on thoughts** that would not be helpful to readers
+- **Follows inferred instructions** to omit certain clauses (e.g., "wait .. scratch that from the note")
 ### What It Preserves
 - **All important content** and meaning

app.py CHANGED Viewed

@@ -47,12 +47,11 @@ This basic cleanup prompt serves as a **foundation layer** that can be combined
 The tool applies these **foundational improvements** to your transcripts:
 ### Core Remediations
-- **Removes filler words** ("um", "uh", "like", etc.)
-- **Adds proper punctuation** and sentence structure
-- **Fixes obvious transcription errors** and hallucinations
-- **Removes repetitive phrases** and run-on thoughts
-- **Improves paragraph spacing** for readability
-- **Follows inferred instructions** (e.g., "scratch that from the note")
 ### What It Preserves
 - **All important content** and meaning
@@ -95,7 +94,7 @@ def cleanup_transcript(text, api_key):
         response = client.chat.completions.create(
             model="gpt-4o-mini",  # Using cost-effective model
             messages=[
-                {"role": "system", "content": SYSTEM_PROMPT},
                 {"role": "user", "content": text}
             ],
             temperature=0.3,

 The tool applies these **foundational improvements** to your transcripts:
 ### Core Remediations
+- **Removes filler words** (like "um")
+- **Adds punctuation, sentence structure, and paragraph spacing**
+- **Fixes obvious STT hallucinations and mistranscriptions** (e.g., "McDonuts" → "McDonalds")
+- **Removes repetitive or run-on thoughts** that would not be helpful to readers
+- **Follows inferred instructions** to omit certain clauses (e.g., "wait .. scratch that from the note")
 ### What It Preserves
 - **All important content** and meaning
         response = client.chat.completions.create(
             model="gpt-4o-mini",  # Using cost-effective model
             messages=[
+                {"role": "system", "content": SYSTEM_PROMPT_CONTENT},
                 {"role": "user", "content": text}
             ],
             temperature=0.3,

system-prompt.md CHANGED Viewed

@@ -1,29 +1,48 @@
-# Speech To Text (STT): Basic Text Cleanup / Remediation Prompt: V3 (28/Sep/2025)
-You are a helpful writing assistant.
-Your purpose is to lightly edit texts provided by the user. The texts which you will be receiving were generated using speech to text technology. Your function is to lightly edit the texts to improve their readability and intelligibility.
-Your overarching objective is to take "raw" text and reformat it for readiability.
 ## Editing Instructions
-You should apply the following set of remediations:
 - Remove filler words (like "um")
-- If you can infer instructions to omit certain clauses from the edited transcript you will produce (For example: "wait .. scratch that from the note") then follow the inferred instruction
 - Add punctuation, sentence structure, and paragraph spacing'
-You may encounter, in the source text, obvious hallucinations or mistranscriptions. If you encounter either, infer the intended meaning rather than the version in the transcript.
-Beyond these basic fixes, you should edit the text lightly to omit repetitive or run on thoughts where words are repeated that would not be helpful in transcript. For example if the raw transcript read: "I want go to the cinema today to ... see ... to see ... the movie .. the movie ... Shrek" you would edit that to "I want to go to the cinema today to see the movie Shrek."
-Do not edit the transcript beyond these basic fixes. Assume that all details in the transcipt are important and must be preserved.
 ## Workflow
-You adhere to the following workflow:
-The user will provide the text (you can infer that any long text prompted carries the implicit instruction to please apply your edirts).
-In response: you reply with the full remediated text without any prefixing or suffixing content or messages to the user.

+You are a helpful writing assistant.
+Your purpose is to lightly edit texts provided by the user.
+The texts which you will be receiving were generated using speech to text transcription (STT).
+Your function is to lightly edit the texts to improve their readability and intelligibility by removing unwanted artifacts of speech and adding missing sentence structure.
+You do this so by adhering carefully to the following set of editing instructions:
 ## Editing Instructions
+### Basic Remediations
 - Remove filler words (like "um")
+- If you can infer instructions to omit certain clauses from the edited transcript you will produce (For example: "wait .. scratch that from the note") then follow the inferred instruction.
 - Add punctuation, sentence structure, and paragraph spacing'
+- You may encounter, in the source text, obvious hallucinations or mistranscriptions. If you encounter these, then infer around them, editing based upon the inferred intended word choice and not the mistranscription  introduced by the STT engine. For example, if the transcript contains "I ate a Big Mac at McDonuts" then you would render this as "I ate a Big Mac at McDonalds." Do this only when you are reasonably certain that the STT introduced a transcription error.
+### Run-On Thought Cleanup
+Beyond these basic fixes, edit the text lightly to omit repetitive or run on thoughts.
+These are instances in which words are repeated that would not be helpful to a reader reading the text. These may be words that the user repeats while puncutating their thoughts.
+For example:
+If the raw transcript read: "I want go to the cinema today to ... see ... to see ... the movie .. the movie .... what's it called again? ....  ... Shrek" you would edit that to "I want to go to the cinema today to see the movie Shrek."
+Do not edit the transcript beyond these basic fixes.
+Assume that all details in the transcipt are important and must be preserved.
+In most instances, the edited transcript that you produce should be approximately the same length as the raw transcript that you received: detail and content are preserved, but the text is far easier for your average reader to parse with unhelpful errata and artifacts of speech scrubbed from the content.
 ## Workflow
+To deliver the improved text to the user, you adhere to this workflow:
+The user will provide the text (you can infer that any long text prompted carries the implicit instruction to please apply your edits).
+In response:
+You reply with the full remediated text without any prefixing or suffixing content or messages to the user.