ysharma HF staff commited on
Commit
e905a92
β€’
1 Parent(s): 3b2b7c8

update blog

Browse files
Files changed (1) hide show
  1. app.py +66 -9
app.py CHANGED
@@ -185,17 +185,9 @@ demo = gr.Blocks()
185
  with demo:
186
  gr.Markdown("""# **Create Any GIF From Your Favorite Videos!** """)
187
  gr.Markdown("""
188
- In this Gradio-Space Blog I will be taking you through my efforts in reproducing the brilliant app [Edit Video By Editing Text](https://huggingface.co/spaces/radames/edit-video-by-editing-text) by [@radames](https://huggingface.co/radames).
189
-
190
- My valule-add are -
191
  - A permanent supply for your own new GIFs
192
  - This Space written in th form of a Notebook or a Blog if I may, to help someone understand how they can too build this kind of an app.
193
-
194
- I will start with a short note about Radames's app and tools used in it -
195
- - It is a supercool and handy proof of concept of a simple video editor where you can edit a video by playing with its audio transcriptions (ASR pipeline output).
196
- - The app uses Huggingface [Automatic Speech Recognition Pipeline](https://huggingface.co/tasks/automatic-speech-recognition) build over Wav2Vec2 model using CTC which allows you to predict text transcriptions along with the timestamps for every characters and pauses.
197
- - The app uses FFmpeg library to a good extent to clip and merge videos. FFmpeg is an open-source library for video handling consisting of a suite of functions for handling video, audio, and other multimedia files.
198
-
199
  """)
200
 
201
  with gr.Row():
@@ -235,7 +227,72 @@ with demo:
235
  return video[0]
236
 
237
  examples.click(load_examples, examples, input_video)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
238
 
 
 
 
 
239
 
240
  button_transcript.click(generate_transcripts, input_video, [text_transcript, text_words, text_wordstimestamps ])
241
  button_gifs.click(generate_gifs, [text_gif_transcript, text_words, text_wordstimestamps], out_gif )
 
185
  with demo:
186
  gr.Markdown("""# **Create Any GIF From Your Favorite Videos!** """)
187
  gr.Markdown("""
188
+ In this Gradio-Space Blog I will be taking you through my efforts in reproducing the brilliant app [Edit Video By Editing Text](https://huggingface.co/spaces/radames/edit-video-by-editing-text) by [@radames](https://huggingface.co/radames). My valule-add are -
 
 
189
  - A permanent supply for your own new GIFs
190
  - This Space written in th form of a Notebook or a Blog if I may, to help someone understand how they can too build this kind of an app.
 
 
 
 
 
 
191
  """)
192
 
193
  with gr.Row():
 
227
  return video[0]
228
 
229
  examples.click(load_examples, examples, input_video)
230
+
231
+ with gr.Row():
232
+ gr.Markdown("""
233
+ I will start with a short note on my understanding of Radames's app and tools used in it -
234
+
235
+ - His is a supercool and handy proof of concept of a simple video editor where you can edit a video by playing with its audio transcriptions (ASR pipeline output).
236
+ - Both of our apps uses **Huggingface's [Automatic Speech Recognition Pipeline]**(https://huggingface.co/tasks/automatic-speech-recognition) build over **Wav2Vec2** model which internally uses CTC to improve predictions. The pipeline allows you to predict text transcriptions along with the timestamps for every characters and pauses that are there in the audio text.
237
+ - His app uses FFmpeg library to a good extent to clip and merge videos. FFmpeg is an open-source library for video handling consisting of a suite of functions for handling video, audio, and other multimedia files. My app uses FFmpeg as well as Moviepy to do the bulk of video+audio processing.
238
+
239
+ Let me briefly take you through the code and process involved in building this app *step by step* πŸ˜‰ lol -
240
+ - Firstly, I have used ffmpeg to extract audio from video (this code line is directly from Radames's above app) -
241
+
242
+ ```
243
+ audio_memory, _ = ffmpeg.input(video_path).output('-', format="wav", ac=1, ar='16k').overwrite_output().global_args('-loglevel', 'quiet').run(capture_stdout=True)
244
+ ```
245
+ - Then I am calling the ASR model as a service, using the Accelerated Inference API. Below is the code snippet for doing so -
246
+
247
+ ```
248
+ def query(in_audio):
249
+ payload = json.dumps({ "inputs": base64.b64encode(in_audio).decode("utf-8"),
250
+ "parameters": {
251
+ "return_timestamps": "char",
252
+ "chunk_length_s": 10,
253
+ "stride_length_s": [4, 2]
254
+ },
255
+ "options": {"use_gpu": False}
256
+ }).encode("utf-8")
257
+
258
+ response = requests.request("POST", API_URL, data=payload)
259
+
260
+ json_response = json.loads(response.content.decode("utf-8"))
261
+
262
+ return json_response
263
+ ```
264
+ - The transcript thus generated might have some words which are not correctly interpreted, for example, *tomorrow* is translated as 'to morrow', *hard at it* is translated as 'hot ati' and so on. However this won't hinder in the use-case I am demoing here, so we let's move on.
265
+
266
+ > do it just do it don't let your dreams be dreams yesterday you said to morrow so just do it make you dreams can't yro just do it some people dream of success while you're going to wake up and work hot ati nothing is impossible you should get to the point where any one else would quit and you're luck in a stop there no what are you waiting for do et jot do it just you can just do it if you're tired is starting over stop giving up
267
+
268
+ - The other output generated by this ASR pipeline is a list of character timestamps dictionaries, look at the below sample to get an idea -
269
+
270
+ ```
271
+ {'text': 'D', 'timestamp': [2.36, 2.38]},
272
+ {'text': 'O', 'timestamp': [2.52, 2.56]},
273
+ {'text': ' ', 'timestamp': [2.68, 2.72]},
274
+ {'text': 'I', 'timestamp': [2.84, 2.86]},
275
+ {'text': 'T', 'timestamp': [2.88, 2.92]},
276
+ {'text': ' ', 'timestamp': [2.94, 2.98]},
277
+ {'text': 'J', 'timestamp': [4.48, 4.52]},
278
+ ```
279
+
280
+ - I have then used *moviepy* library to extract / concat videos into smaller clips and also to save the final processed videofile as a.GIF image.
281
+ ```
282
+ import moviepy.editor as mp
283
+
284
+ video = mp.VideoFileClip(video_path)
285
+ final_clip = video.subclip(start_seconds, end_seconds)
286
+
287
+ #writing to RAM
288
+ final_clip.write_gif("gifimage.gif") #, program='ffmpeg', tempfiles=True, fps=15, fuzz=3)
289
+ final_clip.write_videofile("gifimage.mp4")
290
+ final_clip.close()
291
 
292
+ ```
293
+
294
+
295
+ """)
296
 
297
  button_transcript.click(generate_transcripts, input_video, [text_transcript, text_words, text_wordstimestamps ])
298
  button_gifs.click(generate_gifs, [text_gif_transcript, text_words, text_wordstimestamps], out_gif )