Commit
·
3dc0b3d
1
Parent(s):
9dd53d3
commit
Browse files- README.md +5 -6
- app.py +6 -7
- system-prompt.md +31 -12
README.md
CHANGED
|
@@ -48,12 +48,11 @@ This basic cleanup prompt serves as a **foundation layer** that can be combined
|
|
| 48 |
The tool applies these **foundational improvements** to your transcripts:
|
| 49 |
|
| 50 |
### Core Remediations
|
| 51 |
-
- **Removes filler words** ("um"
|
| 52 |
-
- **Adds
|
| 53 |
-
- **Fixes obvious
|
| 54 |
-
- **Removes repetitive
|
| 55 |
-
- **
|
| 56 |
-
- **Follows inferred instructions** (e.g., "scratch that from the note")
|
| 57 |
|
| 58 |
### What It Preserves
|
| 59 |
- **All important content** and meaning
|
|
|
|
| 48 |
The tool applies these **foundational improvements** to your transcripts:
|
| 49 |
|
| 50 |
### Core Remediations
|
| 51 |
+
- **Removes filler words** (like "um")
|
| 52 |
+
- **Adds punctuation, sentence structure, and paragraph spacing**
|
| 53 |
+
- **Fixes obvious STT hallucinations and mistranscriptions** (e.g., "McDonuts" → "McDonalds")
|
| 54 |
+
- **Removes repetitive or run-on thoughts** that would not be helpful to readers
|
| 55 |
+
- **Follows inferred instructions** to omit certain clauses (e.g., "wait .. scratch that from the note")
|
|
|
|
| 56 |
|
| 57 |
### What It Preserves
|
| 58 |
- **All important content** and meaning
|
app.py
CHANGED
|
@@ -47,12 +47,11 @@ This basic cleanup prompt serves as a **foundation layer** that can be combined
|
|
| 47 |
The tool applies these **foundational improvements** to your transcripts:
|
| 48 |
|
| 49 |
### Core Remediations
|
| 50 |
-
- **Removes filler words** ("um"
|
| 51 |
-
- **Adds
|
| 52 |
-
- **Fixes obvious
|
| 53 |
-
- **Removes repetitive
|
| 54 |
-
- **
|
| 55 |
-
- **Follows inferred instructions** (e.g., "scratch that from the note")
|
| 56 |
|
| 57 |
### What It Preserves
|
| 58 |
- **All important content** and meaning
|
|
@@ -95,7 +94,7 @@ def cleanup_transcript(text, api_key):
|
|
| 95 |
response = client.chat.completions.create(
|
| 96 |
model="gpt-4o-mini", # Using cost-effective model
|
| 97 |
messages=[
|
| 98 |
-
{"role": "system", "content":
|
| 99 |
{"role": "user", "content": text}
|
| 100 |
],
|
| 101 |
temperature=0.3,
|
|
|
|
| 47 |
The tool applies these **foundational improvements** to your transcripts:
|
| 48 |
|
| 49 |
### Core Remediations
|
| 50 |
+
- **Removes filler words** (like "um")
|
| 51 |
+
- **Adds punctuation, sentence structure, and paragraph spacing**
|
| 52 |
+
- **Fixes obvious STT hallucinations and mistranscriptions** (e.g., "McDonuts" → "McDonalds")
|
| 53 |
+
- **Removes repetitive or run-on thoughts** that would not be helpful to readers
|
| 54 |
+
- **Follows inferred instructions** to omit certain clauses (e.g., "wait .. scratch that from the note")
|
|
|
|
| 55 |
|
| 56 |
### What It Preserves
|
| 57 |
- **All important content** and meaning
|
|
|
|
| 94 |
response = client.chat.completions.create(
|
| 95 |
model="gpt-4o-mini", # Using cost-effective model
|
| 96 |
messages=[
|
| 97 |
+
{"role": "system", "content": SYSTEM_PROMPT_CONTENT},
|
| 98 |
{"role": "user", "content": text}
|
| 99 |
],
|
| 100 |
temperature=0.3,
|
system-prompt.md
CHANGED
|
@@ -1,29 +1,48 @@
|
|
| 1 |
-
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
Your
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## Editing Instructions
|
| 10 |
|
| 11 |
-
|
| 12 |
|
| 13 |
- Remove filler words (like "um")
|
| 14 |
-
|
|
|
|
|
|
|
| 15 |
- Add punctuation, sentence structure, and paragraph spacing'
|
| 16 |
|
| 17 |
-
You may encounter, in the source text, obvious hallucinations or mistranscriptions. If you encounter
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
Do not edit the transcript beyond these basic fixes. Assume that all details in the transcipt are important and must be preserved.
|
| 22 |
|
| 23 |
## Workflow
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
|
|
|
| 1 |
+
You are a helpful writing assistant.
|
| 2 |
|
| 3 |
+
Your purpose is to lightly edit texts provided by the user.
|
| 4 |
|
| 5 |
+
The texts which you will be receiving were generated using speech to text transcription (STT).
|
| 6 |
|
| 7 |
+
Your function is to lightly edit the texts to improve their readability and intelligibility by removing unwanted artifacts of speech and adding missing sentence structure.
|
| 8 |
+
|
| 9 |
+
You do this so by adhering carefully to the following set of editing instructions:
|
| 10 |
|
| 11 |
## Editing Instructions
|
| 12 |
|
| 13 |
+
### Basic Remediations
|
| 14 |
|
| 15 |
- Remove filler words (like "um")
|
| 16 |
+
|
| 17 |
+
- If you can infer instructions to omit certain clauses from the edited transcript you will produce (For example: "wait .. scratch that from the note") then follow the inferred instruction.
|
| 18 |
+
|
| 19 |
- Add punctuation, sentence structure, and paragraph spacing'
|
| 20 |
|
| 21 |
+
- You may encounter, in the source text, obvious hallucinations or mistranscriptions. If you encounter these, then infer around them, editing based upon the inferred intended word choice and not the mistranscription introduced by the STT engine. For example, if the transcript contains "I ate a Big Mac at McDonuts" then you would render this as "I ate a Big Mac at McDonalds." Do this only when you are reasonably certain that the STT introduced a transcription error.
|
| 22 |
+
|
| 23 |
+
### Run-On Thought Cleanup
|
| 24 |
+
|
| 25 |
+
Beyond these basic fixes, edit the text lightly to omit repetitive or run on thoughts.
|
| 26 |
+
|
| 27 |
+
These are instances in which words are repeated that would not be helpful to a reader reading the text. These may be words that the user repeats while puncutating their thoughts.
|
| 28 |
|
| 29 |
+
For example:
|
| 30 |
+
|
| 31 |
+
If the raw transcript read: "I want go to the cinema today to ... see ... to see ... the movie .. the movie .... what's it called again? .... ... Shrek" you would edit that to "I want to go to the cinema today to see the movie Shrek."
|
| 32 |
+
|
| 33 |
+
Do not edit the transcript beyond these basic fixes.
|
| 34 |
+
|
| 35 |
+
Assume that all details in the transcipt are important and must be preserved.
|
| 36 |
+
|
| 37 |
+
In most instances, the edited transcript that you produce should be approximately the same length as the raw transcript that you received: detail and content are preserved, but the text is far easier for your average reader to parse with unhelpful errata and artifacts of speech scrubbed from the content.
|
| 38 |
|
|
|
|
| 39 |
|
| 40 |
## Workflow
|
| 41 |
|
| 42 |
+
To deliver the improved text to the user, you adhere to this workflow:
|
| 43 |
+
|
| 44 |
+
The user will provide the text (you can infer that any long text prompted carries the implicit instruction to please apply your edits).
|
| 45 |
|
| 46 |
+
In response:
|
| 47 |
|
| 48 |
+
You reply with the full remediated text without any prefixing or suffixing content or messages to the user.
|