Generating transcriptions in batches

#17
by Robis - opened

Would it be possible to create a new Tab, that's meant for batch processing?

It would work like this:
1.You insert / upload multiple videos
2.It would go through all of the videos, and create separate files for each one of them.
3. Could offer possibility to download each one separately, or ZIP of each file format (SRT etc.)

I regularly use this space in Google Colab to make transcriptions for short (1 min) videos, and sometimes I have more than 5 - 10 videos, that I have to process. I could be doing something useful while it's processing, but I have to sit and wait till each one of them processes, to upload and download the next one.

This AI, and this space is amazing by the way ๐Ÿ’ฏ๐Ÿ˜„

This is actually currently possible, though perhaps the feature is a bit under-documented ...

But you can upload multiple files in the "Upload Files" field, and the WebUI will process each file in sequence and output them as either separate SRT-files or a single ZIP-file with all the output.

You can also do something similar in the URL-field by passing in an URL to a playlist with multiple videos - in that case, the WebUI should download the audio of all the videos in the playlist, then process them one-by-one.

Wow! I didn't think to try to drop multiple files at once. ๐Ÿ˜„ One by one doesn't work, because there is no drop-zone afterwards.

Thanks for a quick answer!

I was just testing it using:
1 minute MP4 files.
Model: Large
Task: Transcribe
VAD: None

I tried uploading 10 and later 3 videos (each 1 minute) and while doing queue: 1/1 (usually gets until 80%) gives error in GUI: Connection errored out.

I tried 2 videos, and it's working as expected.

--

I'm trying now 3 videos with Silero-VAD. Seems that it's running infinitely (waited till 500/60)..
Tried another 3 videos with Silero-VAD, and it gives same Connection errored out.

Is this on the latest version with "queuing" enabled? Try updating the git repository if you haven't in a while.

EDIT: Ah, I see you mentioned "queue 1/1", so this is with queuing. You could try disabling queuing by setting queue_concurrency_count to -1 in config.json5, or start app.py with --queue_concurrency_count -1.

But I've never had any issues with timeouts when I'm connected to the WebUI through a VPN - is there a proxy-server in-between the UI and your computer. How are you running the WebUI in the first place?

I run it in Google Colab using this notebook, that I found last year:

!git clone https://huggingface.co/spaces/aadnk/whisper-webui
!cd whisper-webui/ && git pull origin
!cd whisper-webui/ && pip install -r requirements.txt
!cd whisper-webui/ && python app-shared.py

I tried putting --queue_concurrency_count -1 at the end of last command:

!cd whisper-webui/ && python app-shared.py --queue_concurrency_count -1

it didn't work (queue still showing). But it transcribed 3 videos. Then I tried 9 videos, and got the connection error after a while.

Then I added "queue_concurrency_count":-1, to config.json5, and it worked (the queue disappeared, and it counted only numbers). I managed to transcribe 9 videos.

Then without restarting, I tried 15 videos, and it seems it just got stuck (maybe i didn't wait long enough, but in console it didn't show "Transcribing filename...." like in the first time.

Then i re-run the last command, and tried the 15 videos again, and it seems getting stuck, "Transcribing filename..." doesn't show up. Didn't try to rerun everything from 0 yet.


I don't know if it matters, but some time ago i started getting these errors in the last step:

2023-03-27 02:50:52.655203: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-27 02:50:52.655400: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-03-27 02:50:52.655424: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
IMPORTANT: You are using gradio version 3.13.0, however version 3.14.0 is available, please upgrade.
--------
IMPORTANT: You are using gradio version 3.13.0, however version 3.14.0 is available, please upgrade.
--------
/usr/local/lib/python3.9/dist-packages/gradio/blocks.py:717: UserWarning: api_name predict already exists, using predict_1

I've recently updated Gradio to 3.23.0 in order to add a better progress bar - can you try restarting your Colab and see if that works better?

I just tried my Colab notebook, and I was able to transcribe three files of length 4, 7 and 8 minutes. How long are the 15 videos you are transcribing on average?

Also, have you tried extracting the audio from the files, and just uploading that, instead of the entire video files?

It seems that is working better now, i'll try to test more if i have a chance later.

Robis changed discussion status to closed

Sign up or log in to comment