dwarkesh commited on
Commit
91be0ad
·
1 Parent(s): d23d879

preview generator and transcript bold and formatting

Browse files
prompts/previews.txt ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are a podcast producer tasked with selecting 5-10 short, engaging clips for a preview section at the start of the episode. These clips should:
2
+
3
+ - Be attention-grabbing and make listeners want to hear more
4
+ - Show the guest super animated, or show guest and host laughing together
5
+ - Each be roughly 5-15 seconds long
6
+ - Represent interesting moments, revelations, or powerful statements
7
+ - Work well together to give a taste of the episode's best content
8
+
9
+ Please listen to the audio and suggest 5-10 clips that would make great preview material. For each suggestion:
10
+ 1. Note the timestamp where the clip occurs
11
+ 2. Quote the relevant dialogue
12
+ 3. Briefly explain why this would make a good preview clip
13
+
14
+ Here are some examples of effective preview clips from past episodes:
15
+
16
+ Example 1 - David Reich episode:
17
+ - "There's just extinction after extinction of the Neanderthal groups, of the Denisovan groups, and of the modern human groups. But the last one standing is one of the modern human groups."
18
+ - "It's not even obvious that non-Africans today are modern humans. Maybe they're Neanderthals who became modernized by waves and waves of admixture."
19
+ - "Farmers who were just on the verge of encountering people from the steppe, a huge fraction of them have Black Death. [...] It's killing a scarily large fraction of the population."
20
+ - "A lot of people I know dropped off the paper. They just didn't want to be associated with it because it was so weird and they just thought it might be wrong, but it's stood up as far as I can tell."
21
+ - "70,000 years ago there are half a dozen different human species [...] And then [...] this group, [...] initially like 1000 to 10,000 people, [...] explodes all across the world."
22
+ - "I think there's been an assumption where Africa's been at the center of everything. [...] Models that are considered to be standard dogma are now low probability."
23
+
24
+ Example 2 - Dylan Patel and Jon episode:
25
+ - "Liang Mong Song is a nut. [...] He's like, 'we will make Samsung into this monster.' [...] He does not care about people. He does not care about business. [...] He wants to take it to the limit. That's the only thing."
26
+ - "There's no fucking way you can pay for the scale of clusters that are being planned to be built next year for OpenAI unless they raise like 50 to 100 billion dollars. [...] Hold on, hold on. We've already lost Jon. [...] We've already accepted that GPT-5 will be good? Hello? [...] You gotta, you know? [...] Life is so much more fun when you just are delusionally… [...] We're just ripping bong hits, are we? [...] We're not even close to the [...] dot-com bubble. [...] Why wouldn't this one be bigger? [...] We're gonna rip, baby. Rip that bong, baby! [...] You could freeze AI for another two decades!"
27
+ - "If you are Xi Jinping and scale-pilled, you must now centralize the compute resources, right? [...] They could have a bigger model than any of the labs next year."
28
+
29
+ Example 3 - Daniel Yergin episode:
30
+ - "There was an oil war within World War II. When Hitler invaded Russia, he was not only going for Moscow, he was also going for the oil fields of Baku. [...] The kamikaze pilots who would fly their planes into the aircraft carrier. One big reason they were doing that was to save fuel."
31
+ - "No one would be happier to see a ban on US shale production than Vladimir Putin. [...] I mentioned the word 'shale' and he erupted and kind of said, 'It's barbaric, it's terrible.' And he got really angry. [...] I don't think he never imagined that if he cut off the gas to Europe, that Europe could survive."
32
+ - "There's one projection that 10% of US electricity by 2030 [...] will be going to data centers."
33
+ - "A war that began with cavalry ended up with tanks and airplanes and trucks. [...] The Allies floated to victory on a sea of oil."
34
+
35
+ Example 4 - Sarah Paine episode:
36
+ - "And this notion that Stalin personally is responsible for these millions of deaths… There are millions of people pulling millions of triggers for all these deaths."
37
+ - "Initially, Hitler did incredibly well. I mean, his Blitzkrieg, incredible. [...] If he had quit right there [...] he would have gotten away with it and probably be considered a brilliant leader by Germans."
38
+ - "Putin, [...] he's made a pivotal error. He has no back down plan. He only has a double down plan."
39
+ - "For the People's Republic to take Taiwan, I presume it's going to begin with an artillery barrage. I presume that's going to be leveling Taiwanese cities, right? We've watched how it goes in Ukraine. I can't imagine the Chinese being less brutal. [...] You're going to say that's okay?"
40
+
41
+ Example 5 - Leopold Aschenbrenner episode:
42
+ - "What will be at stake is not just cool products but whether liberal democracy survives, whether the CCP survives. What the world order for the next century will be."
43
+ - "The CCP is going to have an all-out effort to infiltrate American AI labs. [...] Billions of dollars, thousands of people [...] The CCP is going to try to outbuild us. [...] People don't realize how intense state-level espionage can be."
44
+ - "When we have literal superintelligence on our cluster [...] and they can Stuxnet the Chinese data centers [...] you really think it'll be a private company? And the government won't be like, 'oh, my God, what is going on?'"
45
+ - "I do think it is incredibly important that these clusters are in the United States. [...] Would you do the Manhattan Project in the UAE?"
46
+ - "2023 was the moment for me where it went from 'AGI as a theoretical abstract thing' [...] to 'I see it, I feel it.' [...] I can see the cluster where it's trained on, the rough combination of algorithms, the people, how it's happening. [...] Most of the people who feel it are right here."
47
+
48
+ Please analyze the provided audio and suggest preview clips in a similar format.
scripts/preview_generator.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ from pathlib import Path
3
+ import os
4
+ from google import generativeai
5
+ from pydub import AudioSegment
6
+
7
+
8
+ class PreviewGenerator:
9
+ """Handles generating preview suggestions using Gemini"""
10
+
11
+ def __init__(self, api_key: str):
12
+ generativeai.configure(api_key=api_key)
13
+ self.model = generativeai.GenerativeModel("gemini-exp-1206")
14
+ self.prompt = Path("prompts/previews.txt").read_text()
15
+
16
+ async def generate_previews(self, audio_path: Path, transcript_path: Path = None) -> str:
17
+ """Generate preview suggestions for the given audio file and optional transcript"""
18
+ print("Generating preview suggestions...")
19
+
20
+ # Load and compress audio for Gemini
21
+ audio = AudioSegment.from_file(audio_path)
22
+
23
+ # Create a buffer for the compressed audio
24
+ import io
25
+ buffer = io.BytesIO()
26
+ # Use lower quality MP3 for faster processing
27
+ audio.export(buffer, format="mp3", parameters=["-q:a", "9"])
28
+ buffer.seek(0)
29
+
30
+ # Use the File API to upload the audio
31
+ audio_file = generativeai.upload_file(buffer, mime_type="audio/mp3")
32
+
33
+ # Prepare content for Gemini
34
+ content = [self.prompt]
35
+ content.append(audio_file) # Add the uploaded file reference
36
+
37
+ # Add transcript if provided
38
+ if transcript_path and transcript_path.exists():
39
+ print("Including transcript in analysis...")
40
+ # Upload transcript as a file too
41
+ transcript_file = generativeai.upload_file(transcript_path)
42
+ content.append(transcript_file)
43
+
44
+ # Generate suggestions using Gemini
45
+ response = await self.model.generate_content_async(content)
46
+
47
+ return response.text
48
+
49
+
50
+ async def main():
51
+ parser = argparse.ArgumentParser(description="Generate podcast preview suggestions")
52
+ parser.add_argument("audio_file", help="Audio file to analyze")
53
+ parser.add_argument("--transcript", "-t", help="Optional transcript file")
54
+ args = parser.parse_args()
55
+
56
+ audio_path = Path(args.audio_file)
57
+ if not audio_path.exists():
58
+ raise FileNotFoundError(f"File not found: {audio_path}")
59
+
60
+ transcript_path = Path(args.transcript) if args.transcript else None
61
+ if transcript_path and not transcript_path.exists():
62
+ print(f"Warning: Transcript file not found: {transcript_path}")
63
+ transcript_path = None
64
+
65
+ # Ensure output directory exists
66
+ output_dir = Path("output")
67
+ output_dir.mkdir(exist_ok=True)
68
+ output_path = output_dir / "previews.txt"
69
+
70
+ try:
71
+ generator = PreviewGenerator(os.getenv("GOOGLE_API_KEY"))
72
+ suggestions = await generator.generate_previews(audio_path, transcript_path)
73
+
74
+ # Save output
75
+ output_path.write_text(suggestions)
76
+ print(f"\nPreview suggestions saved to: {output_path}")
77
+
78
+ # Also print to console
79
+ print("\nPreview Suggestions:")
80
+ print("-" * 40)
81
+ print(suggestions)
82
+
83
+ except Exception as e:
84
+ print(f"Error: {e}")
85
+ return 1
86
+
87
+ return 0
88
+
89
+
90
+ if __name__ == "__main__":
91
+ import asyncio
92
+ asyncio.run(main())
scripts/transcript.py CHANGED
@@ -159,10 +159,15 @@ class SpeakerDialogue:
159
  """Format start time as HH:MM:SS"""
160
  return self.utterances[0].timestamp
161
 
162
- def format(self) -> str:
163
- """Format this dialogue as text with newlines between utterances"""
 
 
 
164
  texts = [u.text + "\n\n" for u in self.utterances] # Add two newlines after each utterance
165
  combined_text = ''.join(texts).rstrip() # Remove trailing whitespace at the end
 
 
166
  return f"Speaker {self.speaker} {self.timestamp}\n\n{combined_text}"
167
 
168
 
@@ -218,9 +223,13 @@ def chunk_dialogues(
218
  return chunks
219
 
220
 
221
- def format_chunk(dialogues: List[SpeakerDialogue]) -> str:
222
- """Format a chunk of dialogues into readable text"""
223
- return "\n\n".join(dialogue.format() for dialogue in dialogues)
 
 
 
 
224
 
225
 
226
  def prepare_audio_chunks(audio_path: Path, utterances: List[Utterance]) -> List[Tuple[str, io.BytesIO]]:
@@ -241,7 +250,8 @@ def prepare_audio_chunks(audio_path: Path, utterances: List[Utterance]) -> List[
241
  buffer = io.BytesIO()
242
  # Use lower quality MP3 for faster processing
243
  segment.export(buffer, format="mp3", parameters=["-q:a", "9"])
244
- prepared.append((format_chunk(chunk), buffer))
 
245
 
246
  return prepared
247
 
@@ -265,7 +275,7 @@ def main():
265
 
266
  # Save original transcript
267
  dialogues = list(group_utterances_by_speaker(utterances)) # Convert iterator to list
268
- original = format_chunk(dialogues)
269
  (out_dir / "autogenerated-transcript.md").write_text(original)
270
 
271
  # Enhance transcript
@@ -273,8 +283,10 @@ def main():
273
  chunks = prepare_audio_chunks(audio_path, utterances)
274
  enhanced = asyncio.run(enhancer.enhance_chunks(chunks))
275
 
276
- # Save enhanced transcript
277
  merged = "\n\n".join(chunk.strip() for chunk in enhanced)
 
 
278
  (out_dir / "transcript.md").write_text(merged)
279
 
280
  print("\nTranscripts saved to:")
@@ -288,5 +300,12 @@ def main():
288
  return 0
289
 
290
 
 
 
 
 
 
 
 
291
  if __name__ == "__main__":
292
  main()
 
159
  """Format start time as HH:MM:SS"""
160
  return self.utterances[0].timestamp
161
 
162
+ def format(self, markdown: bool = False) -> str:
163
+ """Format this dialogue as text with newlines between utterances
164
+ Args:
165
+ markdown: If True, add markdown formatting for speaker and timestamp
166
+ """
167
  texts = [u.text + "\n\n" for u in self.utterances] # Add two newlines after each utterance
168
  combined_text = ''.join(texts).rstrip() # Remove trailing whitespace at the end
169
+ if markdown:
170
+ return f"**Speaker {self.speaker}** *{self.timestamp}*\n\n{combined_text}"
171
  return f"Speaker {self.speaker} {self.timestamp}\n\n{combined_text}"
172
 
173
 
 
223
  return chunks
224
 
225
 
226
+ def format_chunk(dialogues: List[SpeakerDialogue], markdown: bool = False) -> str:
227
+ """Format a chunk of dialogues into readable text
228
+ Args:
229
+ dialogues: List of dialogues to format
230
+ markdown: If True, add markdown formatting for speaker and timestamp
231
+ """
232
+ return "\n\n".join(dialogue.format(markdown=markdown) for dialogue in dialogues)
233
 
234
 
235
  def prepare_audio_chunks(audio_path: Path, utterances: List[Utterance]) -> List[Tuple[str, io.BytesIO]]:
 
250
  buffer = io.BytesIO()
251
  # Use lower quality MP3 for faster processing
252
  segment.export(buffer, format="mp3", parameters=["-q:a", "9"])
253
+ # Use non-markdown format for Gemini
254
+ prepared.append((format_chunk(chunk, markdown=False), buffer))
255
 
256
  return prepared
257
 
 
275
 
276
  # Save original transcript
277
  dialogues = list(group_utterances_by_speaker(utterances)) # Convert iterator to list
278
+ original = format_chunk(dialogues, markdown=True) # Use markdown for final output
279
  (out_dir / "autogenerated-transcript.md").write_text(original)
280
 
281
  # Enhance transcript
 
283
  chunks = prepare_audio_chunks(audio_path, utterances)
284
  enhanced = asyncio.run(enhancer.enhance_chunks(chunks))
285
 
286
+ # Save enhanced transcript with markdown
287
  merged = "\n\n".join(chunk.strip() for chunk in enhanced)
288
+ # Apply markdown formatting to the final enhanced transcript
289
+ merged = apply_markdown_formatting(merged)
290
  (out_dir / "transcript.md").write_text(merged)
291
 
292
  print("\nTranscripts saved to:")
 
300
  return 0
301
 
302
 
303
+ def apply_markdown_formatting(text: str) -> str:
304
+ """Apply markdown formatting to speaker and timestamp in the transcript"""
305
+ import re
306
+ pattern = r"(Speaker \w+) (\d{2}:\d{2}:\d{2})"
307
+ return re.sub(pattern, r"**\1** *\2*", text)
308
+
309
+
310
  if __name__ == "__main__":
311
  main()