Spaces:
Build error
Build error
updated blog
Browse files
app.py
CHANGED
@@ -229,14 +229,13 @@ with demo:
|
|
229 |
examples.click(load_examples, examples, input_video)
|
230 |
|
231 |
with gr.Row():
|
232 |
-
gr.Markdown("""
|
233 |
-
I will start with a short note on my understanding of Radames's app and tools used in it -
|
234 |
|
235 |
- His is a supercool and handy proof of concept of a simple video editor where you can edit a video by playing with its audio transcriptions (ASR pipeline output).
|
236 |
- Both of our apps uses **Huggingface's [Automatic Speech Recognition Pipeline]**(https://huggingface.co/tasks/automatic-speech-recognition) build over **Wav2Vec2** model which internally uses CTC to improve predictions. The pipeline allows you to predict text transcriptions along with the timestamps for every characters and pauses that are there in the audio text.
|
237 |
- His app uses FFmpeg library to a good extent to clip and merge videos. FFmpeg is an open-source library for video handling consisting of a suite of functions for handling video, audio, and other multimedia files. My app uses FFmpeg as well as Moviepy to do the bulk of video+audio processing.
|
238 |
|
239 |
-
Let me briefly take you through the code and process involved in building this app *step by step* π lol -
|
240 |
- Firstly, I have used ffmpeg to extract audio from video (this code line is directly from Radames's above app) -
|
241 |
|
242 |
```
|
@@ -277,6 +276,9 @@ with demo:
|
|
277 |
{'text': 'J', 'timestamp': [4.48, 4.52]},
|
278 |
```
|
279 |
|
|
|
|
|
|
|
280 |
- I have then used *moviepy* library to extract / concat videos into smaller clips and also to save the final processed videofile as a.GIF image.
|
281 |
```
|
282 |
import moviepy.editor as mp
|
@@ -291,7 +293,17 @@ with demo:
|
|
291 |
|
292 |
```
|
293 |
|
|
|
|
|
|
|
|
|
|
|
|
|
294 |
|
|
|
|
|
|
|
|
|
295 |
""")
|
296 |
|
297 |
button_transcript.click(generate_transcripts, input_video, [text_transcript, text_words, text_wordstimestamps ])
|
|
|
229 |
examples.click(load_examples, examples, input_video)
|
230 |
|
231 |
with gr.Row():
|
232 |
+
gr.Markdown(""" I will start with a short note on my understanding of Radames's app and tools used in it -
|
|
|
233 |
|
234 |
- His is a supercool and handy proof of concept of a simple video editor where you can edit a video by playing with its audio transcriptions (ASR pipeline output).
|
235 |
- Both of our apps uses **Huggingface's [Automatic Speech Recognition Pipeline]**(https://huggingface.co/tasks/automatic-speech-recognition) build over **Wav2Vec2** model which internally uses CTC to improve predictions. The pipeline allows you to predict text transcriptions along with the timestamps for every characters and pauses that are there in the audio text.
|
236 |
- His app uses FFmpeg library to a good extent to clip and merge videos. FFmpeg is an open-source library for video handling consisting of a suite of functions for handling video, audio, and other multimedia files. My app uses FFmpeg as well as Moviepy to do the bulk of video+audio processing.
|
237 |
|
238 |
+
Let me now briefly take you through the code and process involved in building this app *step by step* π lol -
|
239 |
- Firstly, I have used ffmpeg to extract audio from video (this code line is directly from Radames's above app) -
|
240 |
|
241 |
```
|
|
|
276 |
{'text': 'J', 'timestamp': [4.48, 4.52]},
|
277 |
```
|
278 |
|
279 |
+
- Next, using these character timestamps I have extracted word timestamps (by taking the start_timestamp of the first letters and the en_timestamp of the last letter in any give word.
|
280 |
+
- Further when a *sub-transcript* is provided for the producing the GIF, I calculated the start and end timestamp for the whole group of words.
|
281 |
+
|
282 |
- I have then used *moviepy* library to extract / concat videos into smaller clips and also to save the final processed videofile as a.GIF image.
|
283 |
```
|
284 |
import moviepy.editor as mp
|
|
|
293 |
|
294 |
```
|
295 |
|
296 |
+
While working on apps for [Gradio Blocks Party](https://huggingface.co/Gradio-Blocks) I have gained a tremendous amount of knowledge about Gradio and Huggingface APIs and infrastructure. I was also able to polish my understanding and learn new things on some of the key and most interesting ML aspects like Question Answering, Sentence Trnasformers, Summarization, Image Generation, LLMs, Prompt Engineering, and now ASR and Video processing.
|
297 |
+
I absolutely love Spaces, I believe Spaces is much more than a platform to showcase your ML demos. I suppose it can act like an ML Product Sandbox with the benefits whole of Huggingface might and infra behind it. I believe Spaces can become sort of a playground for future ML products and ideas. ALl of this is extremely exciting.
|
298 |
+
|
299 |
+
Thanks for reading so far, I will see you at my next submission. Keep learning and sharing.
|
300 |
+
|
301 |
+
My last two Gradio Blocks Party apps can be found here -
|
302 |
|
303 |
+
- [Gradio-Blocks/GPTJ6B_Poetry_LatentDiff_Illustration](https://huggingface.co/spaces/Gradio-Blocks/GPTJ6B_Poetry_LatentDiff_Illustration), and
|
304 |
+
- [Gradio-Blocks/Ask_Questions_To_YouTube_Videos](https://huggingface.co/spaces/Gradio-Blocks/Ask_Questions_To_YouTube_Videos)
|
305 |
+
|
306 |
+
|
307 |
""")
|
308 |
|
309 |
button_transcript.click(generate_transcripts, input_video, [text_transcript, text_words, text_wordstimestamps ])
|