ysharma HF staff commited on
Commit
fcd9555
β€’
1 Parent(s): e905a92

updated blog

Browse files
Files changed (1) hide show
  1. app.py +15 -3
app.py CHANGED
@@ -229,14 +229,13 @@ with demo:
229
  examples.click(load_examples, examples, input_video)
230
 
231
  with gr.Row():
232
- gr.Markdown("""
233
- I will start with a short note on my understanding of Radames's app and tools used in it -
234
 
235
  - His is a supercool and handy proof of concept of a simple video editor where you can edit a video by playing with its audio transcriptions (ASR pipeline output).
236
  - Both of our apps uses **Huggingface's [Automatic Speech Recognition Pipeline]**(https://huggingface.co/tasks/automatic-speech-recognition) build over **Wav2Vec2** model which internally uses CTC to improve predictions. The pipeline allows you to predict text transcriptions along with the timestamps for every characters and pauses that are there in the audio text.
237
  - His app uses FFmpeg library to a good extent to clip and merge videos. FFmpeg is an open-source library for video handling consisting of a suite of functions for handling video, audio, and other multimedia files. My app uses FFmpeg as well as Moviepy to do the bulk of video+audio processing.
238
 
239
- Let me briefly take you through the code and process involved in building this app *step by step* πŸ˜‰ lol -
240
  - Firstly, I have used ffmpeg to extract audio from video (this code line is directly from Radames's above app) -
241
 
242
  ```
@@ -277,6 +276,9 @@ with demo:
277
  {'text': 'J', 'timestamp': [4.48, 4.52]},
278
  ```
279
 
 
 
 
280
  - I have then used *moviepy* library to extract / concat videos into smaller clips and also to save the final processed videofile as a.GIF image.
281
  ```
282
  import moviepy.editor as mp
@@ -291,7 +293,17 @@ with demo:
291
 
292
  ```
293
 
 
 
 
 
 
 
294
 
 
 
 
 
295
  """)
296
 
297
  button_transcript.click(generate_transcripts, input_video, [text_transcript, text_words, text_wordstimestamps ])
 
229
  examples.click(load_examples, examples, input_video)
230
 
231
  with gr.Row():
232
+ gr.Markdown(""" I will start with a short note on my understanding of Radames's app and tools used in it -
 
233
 
234
  - His is a supercool and handy proof of concept of a simple video editor where you can edit a video by playing with its audio transcriptions (ASR pipeline output).
235
  - Both of our apps uses **Huggingface's [Automatic Speech Recognition Pipeline]**(https://huggingface.co/tasks/automatic-speech-recognition) build over **Wav2Vec2** model which internally uses CTC to improve predictions. The pipeline allows you to predict text transcriptions along with the timestamps for every characters and pauses that are there in the audio text.
236
  - His app uses FFmpeg library to a good extent to clip and merge videos. FFmpeg is an open-source library for video handling consisting of a suite of functions for handling video, audio, and other multimedia files. My app uses FFmpeg as well as Moviepy to do the bulk of video+audio processing.
237
 
238
+ Let me now briefly take you through the code and process involved in building this app *step by step* πŸ˜‰ lol -
239
  - Firstly, I have used ffmpeg to extract audio from video (this code line is directly from Radames's above app) -
240
 
241
  ```
 
276
  {'text': 'J', 'timestamp': [4.48, 4.52]},
277
  ```
278
 
279
+ - Next, using these character timestamps I have extracted word timestamps (by taking the start_timestamp of the first letters and the en_timestamp of the last letter in any give word.
280
+ - Further when a *sub-transcript* is provided for the producing the GIF, I calculated the start and end timestamp for the whole group of words.
281
+
282
  - I have then used *moviepy* library to extract / concat videos into smaller clips and also to save the final processed videofile as a.GIF image.
283
  ```
284
  import moviepy.editor as mp
 
293
 
294
  ```
295
 
296
+ While working on apps for [Gradio Blocks Party](https://huggingface.co/Gradio-Blocks) I have gained a tremendous amount of knowledge about Gradio and Huggingface APIs and infrastructure. I was also able to polish my understanding and learn new things on some of the key and most interesting ML aspects like Question Answering, Sentence Trnasformers, Summarization, Image Generation, LLMs, Prompt Engineering, and now ASR and Video processing.
297
+ I absolutely love Spaces, I believe Spaces is much more than a platform to showcase your ML demos. I suppose it can act like an ML Product Sandbox with the benefits whole of Huggingface might and infra behind it. I believe Spaces can become sort of a playground for future ML products and ideas. ALl of this is extremely exciting.
298
+
299
+ Thanks for reading so far, I will see you at my next submission. Keep learning and sharing.
300
+
301
+ My last two Gradio Blocks Party apps can be found here -
302
 
303
+ - [Gradio-Blocks/GPTJ6B_Poetry_LatentDiff_Illustration](https://huggingface.co/spaces/Gradio-Blocks/GPTJ6B_Poetry_LatentDiff_Illustration), and
304
+ - [Gradio-Blocks/Ask_Questions_To_YouTube_Videos](https://huggingface.co/spaces/Gradio-Blocks/Ask_Questions_To_YouTube_Videos)
305
+
306
+
307
  """)
308
 
309
  button_transcript.click(generate_transcripts, input_video, [text_transcript, text_words, text_wordstimestamps ])