Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -3,33 +3,27 @@ sdk: gradio 
     | 
|
| 3 | 
         
             
            sdk_version: 5.6.0
         
     | 
| 4 | 
         
             
            ---
         
     | 
| 5 | 
         
             
            # Whisper-WebUI
         
     | 
| 6 | 
         
            -
            A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper). 
     | 
| 7 | 
         
            -
             
     | 
| 8 | 
         
            -
            
         
     | 
| 9 | 
         | 
| 10 | 
         
             
            ## Notebook
         
     | 
| 11 | 
         
             
            If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
         
     | 
| 12 | 
         | 
| 13 | 
         
            -
            #  
     | 
| 14 | 
         
             
            - Select the Whisper implementation you want to use between :
         
     | 
| 15 | 
         
             
               - [openai/whisper](https://github.com/openai/whisper)
         
     | 
| 16 | 
         
             
               - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
         
     | 
| 17 | 
         
             
               - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
         
     | 
| 18 | 
         
             
            - Generate subtitles from various sources, including :
         
     | 
| 19 | 
         
             
              - Files
         
     | 
| 20 | 
         
            -
              - Youtube
         
     | 
| 21 | 
         
             
              - Microphone
         
     | 
| 22 | 
         
            -
            - Currently supported  
     | 
| 23 | 
         
            -
              -  
     | 
| 24 | 
         
            -
              -  
     | 
| 25 | 
         
            -
              - txt 
     | 
| 26 | 
         
             
            - Speech to Text Translation 
         
     | 
| 27 | 
         
             
              - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
         
     | 
| 28 | 
         
            -
            - Text to Text Translation
         
     | 
| 29 | 
         
             
              - Translate subtitle files using Facebook NLLB models
         
     | 
| 30 | 
         
            -
             
     | 
| 31 | 
         
            -
            - Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
         
     | 
| 32 | 
         
            -
            - Pre-processing audio input to separate BGM with [UVR](https://github.com/Anjok07/ultimatevocalremovergui), [UVR-api](https://github.com/NextAudioGen/ultimatevocalremover_api). 
         
     | 
| 33 | 
         
             
            - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
         
     | 
| 34 | 
         
             
               - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
         
     | 
| 35 | 
         
             
                  1. https://huggingface.co/pyannote/speaker-diarization-3.1
         
     | 
| 
         @@ -107,15 +101,4 @@ This is Whisper's original VRAM usage table for models. 
     | 
|
| 107 | 
         
             
            | large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
         
     | 
| 108 | 
         | 
| 109 | 
         | 
| 110 | 
         
            -
            `.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!
         
     | 
| 111 | 
         
            -
             
     | 
| 112 | 
         
            -
            ## TODO🗓
         
     | 
| 113 | 
         
            -
             
     | 
| 114 | 
         
            -
            - [x] Add DeepL API translation
         
     | 
| 115 | 
         
            -
            - [x] Add NLLB Model translation
         
     | 
| 116 | 
         
            -
            - [x] Integrate with faster-whisper
         
     | 
| 117 | 
         
            -
            - [x] Integrate with insanely-fast-whisper
         
     | 
| 118 | 
         
            -
            - [x] Integrate with whisperX ( Only speaker diarization part )
         
     | 
| 119 | 
         
            -
            - [x] Add background music separation pre-processing with [UVR](https://github.com/Anjok07/ultimatevocalremovergui)  
         
     | 
| 120 | 
         
            -
            - [ ] Add fast api script
         
     | 
| 121 | 
         
            -
            - [ ] Support real-time transcription for microphone
         
     | 
| 
         | 
|
| 3 | 
         
             
            sdk_version: 5.6.0
         
     | 
| 4 | 
         
             
            ---
         
     | 
| 5 | 
         
             
            # Whisper-WebUI
         
     | 
| 6 | 
         
            +
            A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper).
         
     | 
| 
         | 
|
| 
         | 
|
| 7 | 
         | 
| 8 | 
         
             
            ## Notebook
         
     | 
| 9 | 
         
             
            If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
         
     | 
| 10 | 
         | 
| 11 | 
         
            +
            # Features
         
     | 
| 12 | 
         
             
            - Select the Whisper implementation you want to use between :
         
     | 
| 13 | 
         
             
               - [openai/whisper](https://github.com/openai/whisper)
         
     | 
| 14 | 
         
             
               - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
         
     | 
| 15 | 
         
             
               - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
         
     | 
| 16 | 
         
             
            - Generate subtitles from various sources, including :
         
     | 
| 17 | 
         
             
              - Files
         
     | 
| 
         | 
|
| 18 | 
         
             
              - Microphone
         
     | 
| 19 | 
         
            +
            - Currently supported output formats :
         
     | 
| 20 | 
         
            +
              - csv 
         
     | 
| 21 | 
         
            +
              - srt
         
     | 
| 22 | 
         
            +
              - txt
         
     | 
| 23 | 
         
             
            - Speech to Text Translation 
         
     | 
| 24 | 
         
             
              - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
         
     | 
| 
         | 
|
| 25 | 
         
             
              - Translate subtitle files using Facebook NLLB models
         
     | 
| 26 | 
         
            +
            - Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad). 
         
     | 
| 
         | 
|
| 
         | 
|
| 27 | 
         
             
            - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
         
     | 
| 28 | 
         
             
               - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
         
     | 
| 29 | 
         
             
                  1. https://huggingface.co/pyannote/speaker-diarization-3.1
         
     | 
| 
         | 
|
| 101 | 
         
             
            | large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
         
     | 
| 102 | 
         | 
| 103 | 
         | 
| 104 | 
         
            +
            `.en` models are for English only, and the cool thing is that you can use the `Translate to English` option from the "large" models!
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         |