Spaces:

LAP-DEV
/

Demo

Running

App Files Files Community

LAP-DEV commited on Feb 12

Commit

c435fe8

verified ·

1 Parent(s): 3b9e233

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -25

README.md CHANGED Viewed

@@ -3,33 +3,27 @@ sdk: gradio
 sdk_version: 5.6.0
 ---
 # Whisper-WebUI
-A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper). You can use it as an Easy Subtitle Generator!
-![Whisper WebUI](https://github.com/jhj0517/Whsiper-WebUI/blob/master/screenshot.png)
 ## Notebook
 If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
-# Feature
 - Select the Whisper implementation you want to use between :
    - [openai/whisper](https://github.com/openai/whisper)
    - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
    - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
 - Generate subtitles from various sources, including :
   - Files
-  - Youtube
   - Microphone
-- Currently supported subtitle formats :
-  - SRT
-  - WebVTT
-  - txt ( only text file without timeline )
 - Speech to Text Translation
   - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
-- Text to Text Translation
   - Translate subtitle files using Facebook NLLB models
-  - Translate subtitle files using DeepL API
-- Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
-- Pre-processing audio input to separate BGM with [UVR](https://github.com/Anjok07/ultimatevocalremovergui), [UVR-api](https://github.com/NextAudioGen/ultimatevocalremover_api).
 - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
    - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
       1. https://huggingface.co/pyannote/speaker-diarization-3.1
@@ -107,15 +101,4 @@ This is Whisper's original VRAM usage table for models.
 | large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
-`.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!
-## TODO🗓
-- [x] Add DeepL API translation
-- [x] Add NLLB Model translation
-- [x] Integrate with faster-whisper
-- [x] Integrate with insanely-fast-whisper
-- [x] Integrate with whisperX ( Only speaker diarization part )
-- [x] Add background music separation pre-processing with [UVR](https://github.com/Anjok07/ultimatevocalremovergui)
-- [ ] Add fast api script
-- [ ] Support real-time transcription for microphone

 sdk_version: 5.6.0
 ---
 # Whisper-WebUI
+A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper).
 ## Notebook
 If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
+# Features
 - Select the Whisper implementation you want to use between :
    - [openai/whisper](https://github.com/openai/whisper)
    - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
    - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
 - Generate subtitles from various sources, including :
   - Files
   - Microphone
+- Currently supported output formats :
+  - csv
+  - srt
+  - txt
 - Speech to Text Translation
   - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
   - Translate subtitle files using Facebook NLLB models
+- Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
 - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
    - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
       1. https://huggingface.co/pyannote/speaker-diarization-3.1
 | large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
+`.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!