adastmin commited on
Commit
597a3c5
1 Parent(s): 4be7c72

Upload 18 files

Browse files
Files changed (18) hide show
  1. .gitignore +15 -0
  2. README.md +144 -13
  3. Voice.py +159 -0
  4. app_state.py +7 -0
  5. diarize.py +75 -0
  6. dub_line.py +135 -0
  7. language_detection.py +13 -0
  8. loading subs pseudocode +20 -0
  9. main.py +0 -0
  10. requirements-linux310.txt +237 -0
  11. requirements-win-310.txt +0 -0
  12. requirements.txt +14 -0
  13. synth.py +33 -0
  14. test.py +12 -0
  15. utils.py +53 -0
  16. video.py +219 -0
  17. vocal_isolation.py +47 -0
  18. weeablind.py +163 -0
.gitignore ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ venv
2
+ __pycache__
3
+ .venv
4
+ output/
5
+ *.mkv
6
+ *.wav
7
+ *.mp3
8
+ *.mp4
9
+ *.webm
10
+ pretrained_models
11
+ tmp
12
+ dist
13
+ build
14
+ *.spec
15
+ audio_cache
README.md CHANGED
@@ -1,13 +1,144 @@
1
- ---
2
- title: Dubbing
3
- emoji: 🐠
4
- colorFrom: yellow
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 4.8.0
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Weeablind
2
+
3
+ A program to dub multi-lingual media and anime using modern AI speech synthesis, diarization, language identification, and voice cloning.
4
+
5
+ ## Why
6
+
7
+ Many shows, movies, news segments, interviews, and videos will never receive proper dubs to other languages, and dubbing something from scratch can be an enormous undertaking. This presents a common accessibility hurdle for people with blindness, dyslexia, learning disabilities, or simply folks that don't enjoy reading subtitles. This program aims to create a pleasant alternative for folks facing these struggles.
8
+
9
+ This software is a product of war. My sister turned me onto my now-favorite comedy anime "The Disastrous Life of Saiki K." but Netflix never ordered a dub for the 2nd season. I'm blind and cannot and will not ever be able to read subtitles, but I MUST know how the story progresses! Netflix has forced my hand and I will bring AI-dubbed anime to the blind!
10
+
11
+ ## How
12
+
13
+ This project relies on some rudimentary slapping together of some state of the art technologies. It uses numerous audio processing libraries and techniques to analyze and synthesize speech that tries to stay in-line with the source video file. It primarily relies on ffmpeg and pydub for audio and video editing, Coqui TTS for speech synthesis, speechbrain for language identification, and pyannote.audio for speaker diarization.
14
+
15
+ You have the option of dubbing every subtitle in the video, setting the s tart and end times, dubbing only foreign-language content, or full-blown multi-speaker dubbing with speaking rate and volume matching.
16
+
17
+ ## When?
18
+
19
+ This project is currently what some might call in alpha. The major, core functionality is in place, and it's possible to use by cloning the repo, but it's only starting to be ready for a first release. There are numerous optimizations, UX, and refactoring that need to be done before a first release candidate. Stay tuned for regular updates, and feel free to extend a hand with contributions, testing, or suggestions if this is something you're interested in.
20
+
21
+ ## The Name
22
+
23
+ I had the idea to call the software Weeablind as a portmanteaux of Weeaboo (someone a little too obsessed with anime), and blind. I might change it to something else in the future like Blindtaku, DubHub, or something similar and more catchy because the software can be used for far more than just anime.
24
+
25
+ ## Setup
26
+
27
+ There are currently no prebuilt-binaries to download, this is something I am looking into, but many of these dependencies are not easy to bundle with something like PyInstaller
28
+
29
+ The program works best on Linux, but will also run on Windows.
30
+
31
+ ### System Prerequisits
32
+ You will need to install [FFmpeg](https://ffmpeg.org/download.html) on your system and make sure it's callable from terminal or in your system PATH
33
+
34
+ For using Coqui TTS, you will also need Espeak-NG which you can get from your package manager on Linux or [here](https://github.com/espeak-ng/espeak-ng/releases) on Windows
35
+
36
+ On Windows, pip requires MSVC Build Tools to build Coqui. You can install it here:
37
+ https://visualstudio.microsoft.com/visual-cpp-build-tools/
38
+
39
+ Coqui TTS and Pyannote diarization will also both perform better if you have CUDA set up on your system to use your GPU. This should work out of the box on Linux but getting it set up on Windows takes some doing. This [blog post](https://saturncloud.io/blog/how-to-run-mozilla-ttscoqui-tts-training-with-cuda-on-a-windows-system/) should walk you through the process. If you can't get it working, don't fret, you can still use them on your CPU.
40
+
41
+ The latest version of Python works on Linux, but Spleeter only works on 3.10 and Pyannote can be finicky with that too. 3.10 seems to work the best on on Windows. You can get it from the Microsoft Store.
42
+
43
+ ### Setup from Source
44
+ To use the project, you'll need to clone the repository and install the dependencies in a virtual enviormonet.
45
+
46
+ ```
47
+ git clone https://github.com/FlorianEagox/weeablind.git
48
+ cd weeablind
49
+ python3.10 -m venv venv
50
+ # Windows
51
+ .\venv\Scripts\activate
52
+ # Linux
53
+ source ./venv/bin/activate
54
+ ```
55
+ This project has a lot of dependencies, and pip can struggle with conflicts, so it's best to install from the lock file like this:
56
+ ```
57
+ pip install -r requirements-win-310.txt --no-deps
58
+ ```
59
+ You can try from the regular requirements file, but it can take a heck of a long time and requires some rejiggering sometimes.
60
+
61
+ Installing the dependencies can take a hot minute and uses a lot of space (~8 GB).
62
+
63
+ If you don't need certain features for instance, language filtering, you can omit speechbrain from the readme.
64
+
65
+ once this is completed, you can run the program with
66
+
67
+ ```
68
+ python weeablind.py
69
+ ```
70
+
71
+ ## Usage
72
+ Start by either selecting a video from your computer or pasting a link to a YT video and pressing enter. It should download the video and lot the subs and audio.
73
+
74
+ ### Loading a video
75
+ Once a video is loaded, you can preview the subtitles that will be dubbed. If the wrong language is loaded, or the wrong audio stream, switch to the streams tab and select the correct ones.
76
+
77
+ ### Cropping
78
+ You can specify a start and end time if you only need to dub a section of the video, for example to skip the opening theme and credits of a show. Use timecode syntax like 2:17 and press enter.
79
+
80
+ ### Configuring Voices
81
+ By default, a "Sample" voice should be initialized. You can play around with different configurations and test the voice before dubbing with the "Sample Voice" button in the "Configure Voices" tab. When you have parameters you're happy with, clicking "Update Voices" will re-asign it to that slot. If you choose the SYSTEM tts engine, the program will use Windows' SAPI5 Narrorator or Linux espeak voices by default. This is extremely fast but sounds very robotic. Selecting Coqui gives you a TON of options to play around with, but you will be prompted to download often very heavy TTS models. VCTK/VITS is my favorite model to dub with as it's very quick, even on CPU, and there are hundreds of speakers to choose from. It is loaded by default. If you have ran diarization, you can select different voices from the listbox and change their properties as well.
82
+
83
+ ### Language Filtering
84
+ In the subtitles tab, you filter the subtitles to exclude lines spoken in your selected language so only the foreign language gets dubbed. This is useful for multi-lingual videos, but not videos all in one language.
85
+
86
+ ### Diarization
87
+ Running diarization will attempt to assign the correct speaker to all the subtitles and generate random voices for the total number of speakers detected. In the futre, you'll be able to specify the diarization pipeline and number of speakers if you know ahead of time. Diarization is only useful for videos with multiple speakers and the accuracy can very massively.
88
+
89
+ ### Background Isolation
90
+ In the "Streams" tab, you can run vocal isolation which will attempt to remove the vocals from your source video track but retain the background. If you're using a multi-lingual video and running language filtering as well, you'll need to run that first to keep the english (or whatever source language's vocals).
91
+
92
+ ### Dubbing
93
+ Once you've configured things how you like, you can press the big, JUICY run dubbing button. This can take a while to run. Once completed, you should have something like "MyVideo-dubbed.mkv" in the `output` directory. This is your finished video!
94
+
95
+ ## Things to do
96
+ - A better filtering system for language detection. Maybe inclusive and exclusive or confidence threshhold
97
+ - Find some less copyrighted multi-lingual / non-english content to display demos publicly
98
+ - de-anglicanization it so the user can select their target language instead of just English
99
+ - FIX PYDUB'S STUPID ARRAY DISTORTION so we don't have to perform 5 IO operations per dub!!!
100
+ - ~~run a vocal isolation / remover on the source audio to remove / mitigate the original speakers?~~
101
+ - ~~A proper setup guide for all platforms~~
102
+ - remove or fix the broken espeak implementation to be cross-platform
103
+ - ~~Uninitialized, singletons for heavy models upon startup (e.g. only intialize pyannote/speechbrain pipelines when needed)~~
104
+ - Abstraction for singletons of Coqui voices using the same model to reduce memory footprint
105
+ - ~~GUI tab to list and select audio / subtitle streams w/ FFMPEG~~
106
+ - ~~Move the tabs into their own classes~~
107
+ - ~~Add labels and screen reader landmarks to all the controls~~
108
+ - ~~Single speaker or multi speaker control switch~~
109
+ - ~~Download YouTube video with Closed Captions~~
110
+ - ~~GUI to select start and end time for dubbing~~
111
+ - Throw up a Flask server on my website so you can try it with minimal features.
112
+ - Use OCR to generate subtitles for videos that don't have sub streams
113
+ - Use OCR for non-text based subtitles
114
+ - Make a cool logo?
115
+ - Learn how to package python programs as binaries to make releases
116
+ - ~~Remove the copyrighted content from this repo (sorry not sorry TV Tokyo)~~
117
+ - Save and import config files for later
118
+ - ~~Support for all subtitle formats~~
119
+ - Maybe slap in an ASR library for videos without subtitles?
120
+ - Maybe support for magnet URLs or the arrLib to pirate media (who knows???)
121
+
122
+ ### Diarization
123
+ - Filter subtitles by the selected voice from the listbox
124
+ - Select from multiple diarization models / pipelines
125
+ - Optimize audio trakcs for diarizaiton by isolating lines speech based on subtitle timings
126
+ - Investigate Diart?
127
+
128
+ ### TTS
129
+
130
+ - ~~Rework the speed control to use PyDub to speed up audio.~~
131
+ - ~~match the volume of the speaker to TTS~~
132
+ - Checkbox to remove sequential subtitle entries and entries that are tiny, e.g. "nom" "nom" "nom" "nom"
133
+ - investigate voice conversion?
134
+ - Build an asynchronous queue of operations to perform
135
+ - Started - Asynchronous GUI for Coqui model downloads
136
+ - Add support for MyCroft Mimic 3
137
+ - Add Support for PiperTTS
138
+
139
+ ### Cloning
140
+ - Create a cloning mode to select subtitles and export them to a dataset or wav compilation for Coqui XTTS
141
+ - Use diaries and subtitles to isolate and build training datasets
142
+ - Build a tool to streamline the manual creation of datasets
143
+
144
+ ###### (oh god that's literally so many things, the scope of this has gotten so big how will this ever become a thing)
Voice.py ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from enum import Enum, auto
2
+ import abc
3
+ import os
4
+ import threading
5
+ from time import sleep
6
+ from TTS.api import TTS
7
+ from TTS.utils import manage
8
+ import pyttsx3
9
+ from espeakng import ESpeakNG
10
+ import numpy as np
11
+ from torch.cuda import is_available
12
+ import time
13
+
14
+ class Voice(abc.ABC):
15
+ class VoiceType(Enum):
16
+ ESPEAK = "ESpeak"
17
+ COQUI = "Coqui TTS"
18
+ SYSTEM = "System Voices"
19
+
20
+ def __new__(cls, voice_type, init_args=[], name="Unnamed"):
21
+ if cls is Voice:
22
+ if voice_type == cls.VoiceType.ESPEAK:
23
+ return super().__new__(ESpeakVoice)
24
+ elif voice_type == cls.VoiceType.COQUI:
25
+ return super().__new__(CoquiVoice)
26
+ elif voice_type == cls.VoiceType.SYSTEM:
27
+ return super().__new__(SystemVoice)
28
+ else:
29
+ return super().__new__(cls)
30
+
31
+ def __init__(self, voice_type, init_args=[], name="Unnamed"):
32
+ self.voice_type = voice_type
33
+ self.name = name
34
+ self.voice_option = None
35
+
36
+ @abc.abstractmethod
37
+ def speak(self, text, file_name):
38
+ pass
39
+
40
+ def set_speed(self, speed):
41
+ pass
42
+
43
+ @abc.abstractmethod
44
+ def set_voice_params(self, voice=None, pitch=None):
45
+ pass
46
+
47
+ @abc.abstractmethod
48
+ def list_voice_options(self):
49
+ pass
50
+
51
+ def calibrate_rate(self):
52
+ output_path = './output/calibration.wav'
53
+ calibration_phrase_long = "In the early morning light, a vibrant scene unfolds as the quick brown fox jumps gracefully over the lazy dog. The fox's russet fur glistens in the sun, and its swift movements captivate onlookers. With a leap of agility, it soars through the air, showcasing its remarkable prowess. Meanwhile, the dog, relaxed and unperturbed, watches with half-closed eyes, acknowledging the fox's spirited display. The surrounding nature seems to hold its breath, enchanted by this charming spectacle. The gentle rustling of leaves and the distant chirping of birds provide a soothing soundtrack to this magical moment. The two animals, one lively and the other laid-back, showcase the beautiful harmony of nature, an ageless dance that continues to mesmerize all who witness it."
54
+ calibration_phrase_chair = "A chair is a piece of furniture with a raised surface used to sit on, commonly for use by one person. Chairs are most often supported by four legs and have a back; however, a chair can have three legs or could have a different shape. A chair without a back or arm rests is a stool, or when raised up, a bar stool."
55
+ calibration_phrase = "Hello? Testing, testing. Is.. is this thing on? Ah! Hello Gordon! I'm... assuming that's your real name... You wouldn't lie to us. Would you? Well... You finally did it! You survived the resonance cascade! You brought us all to hell and back, alive! You made it to the ultimate birthday bash at the end of the world! You beat the video game! And... now I imagine you'll... shut it down. Move on with your life. Onwards and upwards, ay Gordon? I don't.. know... how much longer I have to send this to you so I'll try to keep it brief. Not my specialty. Perhaps this is presumptuous of me but... Must this really be the end of our time together? Perhaps you could take the science team's data, transfer us somewhere else, hmm? Now... it doesn't have to be Super Punch-Out for the Super Nintendo Entertainment System. Maybe a USB drive, or a spare floppy disk. You could take us with you! We could see the world! We could... I'm getting a little ahead of myself, surely. Welp! The option's always there! You changed our lives, Gordon. I'd like to think it was for the better. And I don't know what's going to happen to us once you exit the game for good. But I know we'll never forget you. I hope you won't forget us. Well... This is where I get off. Goodbye Gordon!"
56
+ self.speak(calibration_phrase, output_path)
57
+
58
+ def get_wpm(words, duration):
59
+ return (len(words.split(' ')) / duration * 60)
60
+
61
+ class ESpeakVoice(Voice):
62
+ def __init__(self, init_args=[], name="Unnamed"):
63
+ super().__init__(Voice.VoiceType.ESPEAK, init_args, name)
64
+ self.set_voice_params(init_args)
65
+
66
+ def speak(self, text, file_name):
67
+ self.voice.synth_wav(text, file_name)
68
+
69
+ def set_speed(self, speed):
70
+ # current_speaker.set_speed(60*int((len(text.split(' ')) / (sub.end.total_seconds() - sub.start.total_seconds()))))
71
+ self.voice.speed = speed
72
+
73
+ def set_voice_params(self, voice=None, pitch=None):
74
+ if voice:
75
+ self.voice.voice = voice
76
+ if pitch:
77
+ self.voice.pitch = pitch
78
+
79
+ def list_voice_options(self):
80
+ # Optionally, you can return available voice options for ESpeak here
81
+ pass
82
+
83
+ class CoquiVoice(Voice):
84
+ def __init__(self, init_args=None, name="Coqui Voice"):
85
+ super().__init__(Voice.VoiceType.COQUI, init_args, name)
86
+ self.voice = TTS().to('cuda' if is_available() else 'cpu')
87
+ self.langs = ["All Languages"] + list({lang.split("/")[1] for lang in self.voice.list_models()})
88
+ self.langs.sort()
89
+ self.selected_lang = 'en'
90
+ self.is_multispeaker = False
91
+ self.speaker = None
92
+ self.speaker_wav = None
93
+
94
+ def speak(self, text, file_path=None):
95
+ if file_path:
96
+ return self.voice.tts_to_file(
97
+ text,
98
+ file_path=file_path,
99
+ speaker=self.speaker,
100
+ language= 'en' if self.voice.is_multi_lingual else None,
101
+ speaker_wav=self.speaker_wav
102
+ )
103
+ else:
104
+ return np.array(self.voice.tts(
105
+ text,
106
+ speaker=self.speaker,
107
+ language= 'en' if self.voice.is_multi_lingual else None
108
+ ))
109
+
110
+ def set_voice_params(self, voice=None, speaker=None, speaker_wav=None, progress=None):
111
+ if voice and voice != self.voice_option:
112
+ if progress:
113
+ progress(0, "downloading")
114
+ download_thread = threading.Thread(target=self.voice.load_tts_model_by_name, args=(voice,))
115
+ download_thread.start()
116
+ while(download_thread.is_alive()):
117
+ # I'll remove this check if they accept my PR c:
118
+ bar = manage.tqdm_progress if hasattr(manage, "tqdm_progress") else None
119
+ if bar:
120
+ progress_value = int(100*(bar.n / bar.total))
121
+ progress(progress_value, "downloading")
122
+ time.sleep(0.25) # Adjust the interval as needed
123
+ progress(-1, "done!")
124
+ else:
125
+ self.voice.load_tts_model_by_name(voice)
126
+ self.voice_option = self.voice.model_name
127
+ self.is_multispeaker = self.voice.is_multi_speaker
128
+ self.speaker = speaker
129
+
130
+ def list_voice_options(self):
131
+ return self.voice.list_models()
132
+
133
+ def is_model_downloaded(self, model_name):
134
+ return os.path.exists(os.path.join(self.voice.manager.output_prefix, self.voice.manager._set_model_item(model_name)[1]))
135
+
136
+ def list_speakers(self):
137
+ return self.voice.speakers if self.voice.is_multi_speaker else []
138
+
139
+ class SystemVoice(Voice):
140
+ def __init__(self, init_args=[], name="Unnamed"):
141
+ super().__init__(Voice.VoiceType.SYSTEM, init_args, name)
142
+ self.voice = pyttsx3.init()
143
+ self.voice_option = self.voice.getProperty('voice')
144
+
145
+ def speak(self, text, file_name):
146
+ self.voice.save_to_file(text, file_name)
147
+ self.voice.runAndWait()
148
+ return file_name
149
+
150
+ def set_speed(self, speed):
151
+ self.voice.setProperty('rate', speed)
152
+
153
+ def set_voice_params(self, voice=None, pitch=None):
154
+ if voice:
155
+ self.voice.setProperty('voice', voice)
156
+ self.voice_option = self.voice.getProperty('voice')
157
+
158
+ def list_voice_options(self):
159
+ return [voice.name for voice in self.voice.getProperty('voices')]
app_state.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ from Voice import Voice
2
+
3
+ video = None
4
+ speakers = [Voice(Voice.VoiceType.COQUI, name="Sample")]
5
+ speakers[0].set_voice_params('tts_models/en/vctk/vits', 'p326') # p340
6
+ current_speaker = speakers[0]
7
+ sample_speaker = current_speaker
diarize.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This file contains all functions related to diarizing a video including optimization and processing a speech diary (rttm file)
2
+ # These functions use a functional approach as I didn't wanted to group them and not bloat the video class with such specific functions
3
+ # Perhaps going forward I should abstract diary entries as their own objects similar to dub_line, but I haven't decidded yet as diaries might be useful for voice cloning as well
4
+
5
+ import app_state
6
+ import utils
7
+ from Voice import Voice
8
+ from pyannote.audio import Pipeline
9
+ import torchaudio.transforms as T
10
+ import torchaudio
11
+ import random
12
+
13
+ pipeline = None
14
+
15
+ # Read RTTM files generated by Pyannote into an array containing the speaker, start, and end of their speech in the audio
16
+ def load_diary(file):
17
+ diary = []
18
+ with open(file, 'r', encoding='utf-8') as diary_file:
19
+ for line in diary_file.read().strip().split('\n'):
20
+ line_values = line.split(' ')
21
+ diary.append([line_values[7], float(line_values[3]), float(line_values[4])])
22
+ total_speakers = len(set(line[0] for line in diary))
23
+ app_state.speakers = initialize_speakers(total_speakers)
24
+ return diary
25
+
26
+ # Time Shift the speech diary to be in line with the start time
27
+ def update_diary_timing(diary, start_time):
28
+ return [[int(line[0].split('_')[1]), line[1] + start_time, line[2]] for line in diary]
29
+
30
+ def initialize_speakers(speaker_count):
31
+ speakers = []
32
+ speaker_options = app_state.sample_speaker.list_speakers()
33
+ for i in range(speaker_count):
34
+ speakers.append(Voice(Voice.VoiceType.COQUI, f"Voice {i}"))
35
+ speakers[i].set_voice_params('tts_models/en/vctk/vits', random.choice(speaker_options))
36
+ return speakers
37
+
38
+ def find_nearest_speaker(diary, sub):
39
+ return diary[
40
+ utils.find_nearest(
41
+ [diary_entry[1] for diary_entry in diary],
42
+ sub.start
43
+ )
44
+ ][0]
45
+
46
+
47
+
48
+ def optimize_audio_diarization(video):
49
+ crop = video.crop_audio(True)
50
+ waveform, sample_rate = torchaudio.load(crop)
51
+ # Apply noise reduction
52
+ noise_reduce = T.Vad(sample_rate=sample_rate)
53
+ clean_waveform = noise_reduce(waveform)
54
+
55
+ # Normalize audio
56
+ normalize = T.Resample(orig_freq=sample_rate, new_freq=sample_rate)
57
+ normalized_waveform = normalize(clean_waveform)
58
+
59
+ return normalized_waveform, sample_rate
60
+
61
+ def run_diarization(video):
62
+ global pipeline # Probably should move this to app state?
63
+ if not pipeline:
64
+ pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.0", use_auth_token="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")
65
+ import torch
66
+ pipeline.to(torch.device("cuda"))
67
+ output = utils.get_output_path(video.file, ".rttm")
68
+ optimized, sample_rate = optimize_audio_diarization(video)
69
+ diarization = pipeline({"waveform": optimized, "sample_rate": sample_rate})
70
+ with open(output, "w") as rttm:
71
+ diarization.write_rttm(rttm)
72
+ diary = load_diary(output)
73
+ diary = update_diary_timing(diary, video.start_time)
74
+ for sub in video.subs_adjusted:
75
+ sub.voice = find_nearest_speaker(diary, sub)
dub_line.py ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dataclasses import dataclass
2
+ from Voice import Voice
3
+ import ffmpeg
4
+ import utils
5
+ import app_state
6
+ import srt
7
+ from re import compile, sub as substitute
8
+ from pydub import AudioSegment
9
+ from audiotsm import wsola
10
+ from audiotsm.io.wav import WavReader, WavWriter
11
+ from audiotsm.io.array import ArrayReader, ArrayWriter
12
+ from speechbrain.pretrained import EncoderClassifier
13
+ import numpy as np
14
+ from language_detection import detect_language
15
+ remove_xml = compile(r'<[^>]+>|\{[^}]+\}')
16
+ language_identifier_model = None # EncoderClassifier.from_hparams(source="speechbrain/lang-id-voxlingua107-ecapa", savedir="tmp")
17
+
18
+ @dataclass
19
+ class DubbedLine:
20
+ start: float
21
+ end: float
22
+ text: str
23
+ index: int
24
+ voice: int = 0
25
+ language: str = ""
26
+
27
+ # This is highly inefficient as it writes and reads the same file many times
28
+ def dub_line_file(self, match_volume=True, output=False):
29
+ output_path = utils.get_output_path(str(self.index), '.wav', path='files')
30
+ tts_audio = app_state.speakers[self.voice].speak(self.text, output_path)
31
+ rate_adjusted = self.match_rate(tts_audio, self.end-self.start)
32
+ segment = AudioSegment.from_wav(rate_adjusted)
33
+ if match_volume:
34
+ segment = self.match_volume(app_state.video.get_snippet(self.start, self.end), segment)
35
+ if output:
36
+ segment.export(output_path, format='wav')
37
+ return segment
38
+
39
+ # This should ideally be a much more efficient way to dub.
40
+ # All functions should pass around numpy arrays rather than reading and writting files. For some reason though, it gives distroted results
41
+ def dub_line_ram(self, output=True):
42
+ output_path = utils.get_output_path(str(self.index), '.wav', path='files')
43
+ tts_audio = app_state.speakers[self.voice].speak(self.text)
44
+ rate_adjusted = self.match_rate_ram(tts_audio, self.end-self.start)
45
+ data = rate_adjusted / np.max(np.abs(rate_adjusted))
46
+ # This causes some kind of wacky audio distrotion we NEED to fix ;C
47
+ audio_as_int = (data * (2**15)).astype(np.int16).tobytes()
48
+ segment = AudioSegment(
49
+ audio_as_int,
50
+ frame_rate=22050,
51
+ sample_width=2,
52
+ channels=1
53
+ )
54
+ if output:
55
+ segment.export(output_path, format='wav')
56
+ return segment
57
+
58
+ def match_rate(self, target_path, source_duration, destination_path=None, clamp_min=0, clamp_max=4):
59
+ if destination_path == None:
60
+ destination_path = target_path.split('.')[0] + '-timeshift.wav'
61
+ duration = float(ffmpeg.probe(target_path)["format"]["duration"])
62
+ rate = duration*1/source_duration
63
+ rate = np.clip(rate, clamp_min, clamp_max)
64
+ with WavReader(target_path) as reader:
65
+ with WavWriter(destination_path, reader.channels, reader.samplerate) as writer:
66
+ tsm = wsola(reader.channels, speed=rate)
67
+ tsm.run(reader, writer)
68
+ return destination_path
69
+
70
+ def match_rate_ram(self, target, source_duration, outpath=None, clamp_min=0.8, clamp_max=2.5):
71
+ num_samples = len(target)
72
+ target = target.reshape(1, num_samples)
73
+ duration = num_samples / 22050
74
+ rate = duration*1/source_duration
75
+ rate = np.clip(rate, clamp_min, clamp_max)
76
+ reader = ArrayReader(target)
77
+ tsm = wsola(reader.channels, speed=rate)
78
+ if not outpath:
79
+ rate_adjusted = ArrayWriter(channels=1)
80
+ tsm.run(reader, rate_adjusted)
81
+ return rate_adjusted.data
82
+ else:
83
+ rate_adjusted = WavWriter(outpath, 1, 22050)
84
+ tsm.run(reader, rate_adjusted)
85
+ rate_adjusted.close()
86
+ return outpath
87
+
88
+ def match_volume(self, source_snippet, target):
89
+ # ratio = source_snippet.rms / (target.rms | 1)
90
+ ratio = source_snippet.dBFS - target.dBFS
91
+ # adjusted_audio = target.apply_gain(ratio)
92
+ adjusted_audio = target + ratio
93
+ return adjusted_audio
94
+ # adjusted_audio.export(output_path, format="wav")
95
+
96
+ def get_language(self, source_snippet):
97
+ if not self.language:
98
+ self.language = detect_language(source_snippet)
99
+ return self.language
100
+
101
+
102
+ def filter_junk(subs, minimum_duration=0.1, remove_repeats=True):
103
+ filtered = []
104
+ previous = ""
105
+ for sub in subs:
106
+ if (sub.end - sub.start) > minimum_duration:
107
+ if sub.text != previous:
108
+ filtered.append(sub)
109
+ previous = sub.text
110
+ return filtered
111
+
112
+ # This function is designed to handle two cases
113
+ # 1 We just have a path to an srt that we want to import
114
+ # 2 You have a file containing subs, but not srt (a video file, a vtt, whatever)
115
+ # In this case, we must extract or convert the subs to srt, and then read it in (export then import)
116
+ def load_subs(import_path="", extract_subs_path=False, filter=True):
117
+ if extract_subs_path: # For importing an external subtitles file
118
+ (
119
+ ffmpeg
120
+ .input(extract_subs_path)
121
+ .output(import_path)
122
+ .global_args('-loglevel', 'error')
123
+ .run(overwrite_output=True)
124
+ )
125
+ with open(import_path, "r", encoding="utf-8") as f:
126
+ original_subs = list(srt.parse(f.read()))
127
+ return filter_junk([
128
+ DubbedLine(
129
+ sub.start.total_seconds(),
130
+ sub.end.total_seconds(),
131
+ substitute(remove_xml, '', sub.content),
132
+ sub.index
133
+ )
134
+ for sub in original_subs
135
+ ])
language_detection.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This is used to detect the spoken language in an audio file
2
+ # I wanted to abstract it to it's own file, just like vocal isolation & diarization
3
+ from speechbrain.pretrained import EncoderClassifier
4
+
5
+ language_identifier_model = None
6
+
7
+ def detect_language(file):
8
+ global language_identifier_model
9
+ if not language_identifier_model:
10
+ language_identifier_model = EncoderClassifier.from_hparams(source="speechbrain/lang-id-voxlingua107-ecapa", savedir="tmp") #, run_opts={"device":"cuda"})
11
+ signal = language_identifier_model.load_audio(file)
12
+ prediction = language_identifier_model.classify_batch(signal)
13
+ return prediction[3][0].split(' ')[1]
loading subs pseudocode ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ user loads video
2
+ video is file:
3
+ file has subs?
4
+ load the first subs
5
+ display all subs
6
+ user selects new subs:
7
+ load subs with given stream index
8
+ video is YT link:
9
+ download all subs (if any)
10
+ subs?
11
+ display the subs
12
+ user selects subs (vtt)
13
+ convert the subs to srt
14
+ load subs
15
+ there are no subs!?!?!:
16
+ This is the spooky zone
17
+ offer to upload a subtitle file?
18
+
19
+ offer to attempt video OCR???
20
+ attempt ASR + Translation? This would be fucking insane don't do this please don't add this feature this is literally impossible, right???
main.py ADDED
File without changes
requirements-linux310.txt ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ absl-py==2.0.0
2
+ accelerate==0.24.1
3
+ aiohttp==3.8.6
4
+ aiosignal==1.3.1
5
+ alembic==1.12.1
6
+ antlr4-python3-runtime==4.9.3
7
+ anyascii==0.3.2
8
+ anyio==3.7.1
9
+ appdirs==1.4.4
10
+ asteroid-filterbanks==0.4.0
11
+ astunparse==1.6.3
12
+ async-timeout==4.0.3
13
+ attrs==23.1.0
14
+ audioread==3.0.1
15
+ audiotsm==0.1.2
16
+ Babel==2.13.1
17
+ bangla==0.0.2
18
+ blinker==1.7.0
19
+ bnnumerizer==0.0.2
20
+ bnunicodenormalizer==0.1.6
21
+ Brotli==1.1.0
22
+ cachetools==5.3.2
23
+ certifi==2023.7.22
24
+ cffi==1.16.0
25
+ charset-normalizer==3.3.2
26
+ clean-fid==0.1.35
27
+ click==7.1.2
28
+ clip-anytorch==2.5.2
29
+ colorama==0.4.6
30
+ coloredlogs==15.0.1
31
+ colorlog==6.7.0
32
+ contourpy==1.1.1
33
+ coqpit==0.0.17
34
+ cycler==0.12.1
35
+ Cython==0.29.30
36
+ dateparser==1.1.8
37
+ decorator==5.1.1
38
+ docker-pycreds==0.4.0
39
+ docopt==0.6.2
40
+ einops==0.6.1
41
+ encodec==0.1.1
42
+ exceptiongroup==1.1.3
43
+ ffmpeg-python==0.2.0
44
+ filelock==3.13.1
45
+ Flask==2.3.3
46
+ flatbuffers==1.12
47
+ fonttools==4.43.1
48
+ frozenlist==1.4.0
49
+ fsspec==2023.6.0
50
+ ftfy==6.1.1
51
+ future==0.18.3
52
+ g2pkk==0.1.2
53
+ gast==0.4.0
54
+ gitdb==4.0.11
55
+ GitPython==3.1.40
56
+ google-auth==2.23.4
57
+ google-auth-oauthlib==0.4.6
58
+ google-pasta==0.2.0
59
+ greenlet==3.0.1
60
+ grpcio==1.59.2
61
+ gruut==2.2.3
62
+ gruut-ipa==0.13.0
63
+ gruut-lang-de==2.0.0
64
+ gruut-lang-en==2.0.0
65
+ gruut-lang-es==2.0.0
66
+ gruut-lang-fr==2.0.2
67
+ h11==0.12.0
68
+ h2==4.1.0
69
+ h5py==3.10.0
70
+ hpack==4.0.0
71
+ httpcore==0.13.7
72
+ httpx==0.19.0
73
+ huggingface-hub==0.18.0
74
+ humanfriendly==10.0
75
+ hyperframe==6.0.1
76
+ HyperPyYAML==1.2.2
77
+ idna==3.4
78
+ imageio==2.31.6
79
+ inflect==5.6.2
80
+ itsdangerous==2.1.2
81
+ jamo==0.4.1
82
+ jieba==0.42.1
83
+ Jinja2==3.1.2
84
+ joblib==1.3.2
85
+ jsonlines==1.2.0
86
+ jsonmerge==1.9.2
87
+ jsonschema==4.19.2
88
+ jsonschema-specifications==2023.7.1
89
+ julius==0.2.7
90
+ k-diffusion==0.0.16
91
+ keras==2.9.0
92
+ Keras-Preprocessing==1.1.2
93
+ kiwisolver==1.4.5
94
+ kornia==0.7.0
95
+ lazy_loader==0.3
96
+ libclang==16.0.6
97
+ librosa==0.10.0
98
+ lightning==2.1.0
99
+ lightning-utilities==0.9.0
100
+ llvmlite==0.40.1
101
+ Mako==1.2.4
102
+ Markdown==3.5.1
103
+ markdown-it-py==3.0.0
104
+ MarkupSafe==2.1.3
105
+ matplotlib==3.7.3
106
+ mdurl==0.1.2
107
+ mpmath==1.3.0
108
+ msgpack==1.0.7
109
+ multidict==6.0.4
110
+ mutagen==1.47.0
111
+ networkx==2.8.8
112
+ nltk==3.8.1
113
+ norbert==0.2.1
114
+ num2words==0.5.13
115
+ numba==0.57.0
116
+ numpy==1.22.0
117
+ nvidia-cublas-cu12==12.1.3.1
118
+ nvidia-cuda-cupti-cu12==12.1.105
119
+ nvidia-cuda-nvrtc-cu12==12.1.105
120
+ nvidia-cuda-runtime-cu12==12.1.105
121
+ nvidia-cudnn-cu12==8.9.2.26
122
+ nvidia-cufft-cu12==11.0.2.54
123
+ nvidia-curand-cu12==10.3.2.106
124
+ nvidia-cusolver-cu12==11.4.5.107
125
+ nvidia-cusparse-cu12==12.1.0.106
126
+ nvidia-nccl-cu12==2.18.1
127
+ nvidia-nvjitlink-cu12==12.3.52
128
+ nvidia-nvtx-cu12==12.1.105
129
+ oauthlib==3.2.2
130
+ omegaconf==2.3.0
131
+ onnxruntime-gpu==1.16.1
132
+ opt-einsum==3.3.0
133
+ optuna==3.4.0
134
+ packaging==23.1
135
+ pandas==1.5.3
136
+ pathtools==0.1.2
137
+ Pillow==10.0.1
138
+ platformdirs==3.11.0
139
+ pooch==1.8.0
140
+ primePy==1.3
141
+ protobuf==3.20.3
142
+ psutil==5.9.6
143
+ py-espeak-ng==0.1.8
144
+ pyannote.audio==3.0.1
145
+ pyannote.core==5.0.0
146
+ pyannote.database==5.0.1
147
+ pyannote.metrics==3.2.1
148
+ pyannote.pipeline==3.0.1
149
+ pyasn1==0.5.0
150
+ pyasn1-modules==0.3.0
151
+ pycparser==2.21
152
+ pycryptodomex==3.19.0
153
+ pydub==0.25.1
154
+ Pygments==2.16.1
155
+ pynndescent==0.5.10
156
+ pyparsing==3.1.1
157
+ pypinyin==0.49.0
158
+ pysbd==0.3.4
159
+ python-crfsuite==0.9.9
160
+ python-dateutil==2.8.2
161
+ pytorch-lightning==2.1.0
162
+ pytorch-metric-learning==2.3.0
163
+ pyttsx3==2.90
164
+ pytz==2023.3.post1
165
+ PyYAML==6.0.1
166
+ referencing==0.30.2
167
+ regex==2023.10.3
168
+ requests==2.31.0
169
+ requests-oauthlib==1.3.1
170
+ resize-right==0.0.2
171
+ rfc3986==1.5.0
172
+ rich==13.6.0
173
+ rpds-py==0.10.6
174
+ rsa==4.9
175
+ ruamel.yaml==0.18.4
176
+ ruamel.yaml.clib==0.2.8
177
+ safetensors==0.4.0
178
+ scikit-image==0.22.0
179
+ scikit-learn==1.3.0
180
+ scipy==1.11.3
181
+ semver==3.0.2
182
+ sentencepiece==0.1.99
183
+ sentry-sdk==1.34.0
184
+ setproctitle==1.3.3
185
+ shellingham==1.5.4
186
+ six==1.16.0
187
+ smmap==5.0.1
188
+ sniffio==1.3.0
189
+ sortedcontainers==2.4.0
190
+ soundfile==0.12.1
191
+ soxr==0.3.7
192
+ speechbrain==0.5.15
193
+ spleeter==2.4.0
194
+ SQLAlchemy==2.0.23
195
+ srt==3.5.3
196
+ sympy==1.12
197
+ tabulate==0.9.0
198
+ tbb==2021.10.0
199
+ tensorboard==2.9.1
200
+ tensorboard-data-server==0.6.1
201
+ tensorboard-plugin-wit==1.8.1
202
+ tensorboardX==2.6.2.2
203
+ tensorflow==2.9.3
204
+ tensorflow-estimator==2.9.0
205
+ tensorflow-io-gcs-filesystem==0.34.0
206
+ termcolor==2.3.0
207
+ threadpoolctl==3.2.0
208
+ tifffile==2023.9.26
209
+ tokenizers==0.13.3
210
+ torch==2.1.0
211
+ torch-audiomentations==0.11.0
212
+ torch-pitch-shift==1.2.4
213
+ torchaudio==2.1.0
214
+ torchdiffeq==0.2.3
215
+ torchmetrics==1.2.0
216
+ torchsde==0.2.6
217
+ torchvision==0.16.0
218
+ tqdm==4.64.1
219
+ trainer==0.0.31
220
+ trampoline==0.1.2
221
+ transformers==4.33.3
222
+ triton==2.1.0
223
+ TTS==0.19.1
224
+ typer==0.3.2
225
+ typing_extensions==4.8.0
226
+ tzlocal==5.2
227
+ umap-learn==0.5.4
228
+ Unidecode==1.3.7
229
+ urllib3==2.0.7
230
+ wandb==0.15.12
231
+ wcwidth==0.2.9
232
+ websockets==12.0
233
+ Werkzeug==3.0.1
234
+ wrapt==1.15.0
235
+ wxPython==4.2.1
236
+ yarl==1.9.2
237
+ yt-dlp==2023.10.13
requirements-win-310.txt ADDED
Binary file (8.92 kB). View file
 
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ tts # <-- Coqui TTS engine
2
+ pyannote.audio
3
+ ffmpeg-python
4
+ srt
5
+ py-espeak-ng
6
+ pydub
7
+ # pyAudio # <--- Needed on Windows, breaks on Linux
8
+ -f https://extras.wxpython.org/wxPython4/extras/linux/gtk3/ubuntu-22.04
9
+ wxpython
10
+ pyttsx3 # <-- System TTS engine
11
+ yt-dlp # <-- Downloading YT vids
12
+ audiotsm # <-- Audio timestretching
13
+ speechbrain # <-- Audio Language Identification
14
+ spleeter # <-- Vocal / Background isolation
synth.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Formerly the prototypical file, synth. Now it's just a graveyard of functions that may never return?
2
+ from pydub import AudioSegment
3
+
4
+
5
+ import concurrent.futures
6
+ from utils import get_output_path
7
+
8
+
9
+ # This function was intended to run with multiprocessing, but Coqui won't play nice with that.
10
+ def dub_task(sub, i):
11
+ print(f"{i}/{len(subs_adjusted)}")
12
+ try:
13
+ return dub_line_ram(sub)
14
+ # empty_audio = empty_audio.overlay(line, sub.start*1000)
15
+ except Exception as e:
16
+ print(e)
17
+ with open(f"output/errors/{i}-rip.txt", 'w') as f:
18
+ f.write(e)
19
+ # total_errors += 1
20
+
21
+ # This may be used for multithreading?
22
+ def combine_segments():
23
+ empty_audio = AudioSegment.silent(total_duration * 1000, frame_rate=22050)
24
+ total_errors = 0
25
+ for sub in subs_adjusted:
26
+ print(f"{sub.index}/{len(subs_adjusted)}")
27
+ try:
28
+ segment = AudioSegment.from_file(f'output/files/{sub.index}.wav')
29
+ empty_audio = empty_audio.overlay(segment, sub.start*1000)
30
+ except:
31
+ total_errors += 1
32
+ empty_audio.export('new.wav')
33
+ print(total_errors)
test.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This file is just a quick script for whatever I'm testing at the time, it's not really important
2
+
3
+ # testing XTTS / VC models
4
+
5
+ from TTS.api import TTS
6
+ tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to('cuda')
7
+
8
+ # generate speech by cloning a voice using default settings
9
+ tts.tts_to_file(text="Welcome to DougDoug, where we solve problems that no one has",
10
+ file_path="/media/tessa/SATA SSD1/AI MODELS/cloning/output/doug.wav",
11
+ speaker_wav="/media/tessa/SATA SSD1/AI MODELS/cloning/doug.wav",
12
+ language="en")
utils.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os.path
2
+ import app_state
3
+ import numpy as np
4
+ from pydub.playback import play
5
+ from pydub import AudioSegment
6
+ from torch.cuda import is_available
7
+
8
+ APP_NAME = "WeeaBlind"
9
+ test_video_name = "./output/download.webm"
10
+ default_sample_path = "./output/sample.wav"
11
+ test_start_time = 94
12
+ test_end_time = 1324
13
+ gpu_detected = is_available()
14
+
15
+ def create_output_dir():
16
+ path = './output/files'
17
+ if not os.path.exists(path):
18
+ os.makedirs(path)
19
+
20
+ def get_output_path(input, suffix, prefix='', path=''):
21
+ filename = os.path.basename(input)
22
+ filename_without_extension = os.path.splitext(filename)[0]
23
+ return os.path.join(os.path.dirname(os.path.abspath(__file__)), 'output', path, f"{prefix}{filename_without_extension}{suffix}")
24
+
25
+ def timecode_to_seconds(timecode):
26
+ parts = list(map(float, timecode.split(':')))
27
+ seconds = parts[-1]
28
+ if len(parts) > 1:
29
+ seconds += parts[-2] * 60
30
+ if len(parts) > 2:
31
+ seconds += parts[-3] * 3600
32
+ return seconds
33
+
34
+ def seconds_to_timecode(seconds):
35
+ hours = int(seconds // 3600)
36
+ minutes = int((seconds % 3600) // 60)
37
+ seconds = seconds % 60
38
+ timecode = ""
39
+ if hours:
40
+ timecode += f"{hours}:"
41
+ if minutes:
42
+ timecode += f"{minutes}:"
43
+ timecode = f"{timecode}{seconds:05.2f}"
44
+ return timecode
45
+
46
+ # Finds the closest element in an arry to the given value
47
+ def find_nearest(array, value):
48
+ return (np.abs(np.asarray(array) - value)).argmin()
49
+
50
+ def sampleVoice(text, output=default_sample_path):
51
+ play(AudioSegment.from_file(app_state.sample_speaker.speak(text, output)))
52
+
53
+ snippet_export_path = get_output_path("video_snippet", "wav")
video.py ADDED
@@ -0,0 +1,219 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ The Video class represents a reference to a video from either a file or web link. This class should implement the ncessary info to dub a video.
3
+ """
4
+
5
+ from io import StringIO
6
+ import time
7
+ import ffmpeg
8
+ from yt_dlp import YoutubeDL
9
+ import utils
10
+ from pydub import AudioSegment
11
+ from dub_line import load_subs
12
+ import json
13
+ import numpy as np
14
+ import librosa
15
+ import soundfile as sf
16
+
17
+ class Video:
18
+ def __init__(self, video_URL, loading_progress_hook=print):
19
+ self.start_time = self.end_time = 0
20
+ self.downloaded = False
21
+ self.subs = self.subs_adjusted = self.subs_removed = []
22
+ self.background_track = self.vocal_track = None
23
+ self.speech_diary = self.speech_diary_adjusted = None
24
+ self.load_video(video_URL, loading_progress_hook)
25
+
26
+
27
+ # This is responsible for loading the app's audio and subtitles from a video file or YT link
28
+ def load_video(self, video_path, progress_hook=print):
29
+ sub_path = ""
30
+ if video_path.startswith("http"):
31
+ self.downloaded = True
32
+ try:
33
+ video_path, sub_path, self.yt_sub_streams = self.download_video(video_path, progress_hook)
34
+ except: return
35
+ progress_hook({"status":"complete"})
36
+ else:
37
+ self.downloaded = False
38
+ self.file = video_path
39
+ if not (self.downloaded and not sub_path):
40
+ try:
41
+ self.subs = self.subs_adjusted = load_subs(utils.get_output_path(self.file, '.srt'), sub_path or video_path)
42
+ except:
43
+ progress_hook({"status": "subless"})
44
+ self.audio = AudioSegment.from_file(video_path)
45
+ self.duration = float(ffmpeg.probe(video_path)["format"]["duration"])
46
+ if self.subs:
47
+ self.update_time(0, self.duration)
48
+
49
+ def download_video(self, link, progress_hook=print):
50
+ options = {
51
+ 'outtmpl': 'output/%(id)s.%(ext)s',
52
+ 'writesubtitles': True,
53
+ "subtitleslangs": ["all"],
54
+ "progress_hooks": (progress_hook,)
55
+ }
56
+ try:
57
+ with YoutubeDL(options) as ydl:
58
+ info = ydl.extract_info(link)
59
+ return ydl.prepare_filename(info), list(info["subtitles"].values())[0][-1]["filepath"] if info["subtitles"] else None, info["subtitles"]
60
+ except Exception as e:
61
+ print('AHHH\n',e,'\nAHHHHHH')
62
+ progress_hook({"status": "error", "error": e})
63
+ raise e
64
+
65
+
66
+ def update_time(self, start, end):
67
+ self.start_time = start
68
+ self.end_time = end
69
+ # clamp the subs to the crop time specified
70
+ start_line = utils.find_nearest([sub.start for sub in self.subs], start)
71
+ end_line = utils.find_nearest([sub.start for sub in self.subs], end)
72
+ self.subs_adjusted = self.subs[start_line:end_line]
73
+ if self.speech_diary:
74
+ self.update_diary_timing()
75
+
76
+ def list_streams(self):
77
+ probe = ffmpeg.probe(self.file)["streams"]
78
+ if self.downloaded:
79
+ subs = [{"name": stream[-1]['name'], "stream": stream[-1]['filepath']} for stream in self.yt_sub_streams.values()]
80
+ else:
81
+ subs = [{"name": stream['tags'].get('language', 'unknown'), "stream": stream['index']} for stream in probe if stream["codec_type"] == "subtitle"]
82
+ return {
83
+ "audio": [stream for stream in probe if stream["codec_type"] == "audio"],
84
+ "subs": subs
85
+ }
86
+
87
+ def get_snippet(self, start, end):
88
+ return self.audio[start*1000:end*1000]
89
+
90
+ # Crops the video's audio segment to reduce memory size
91
+ def crop_audio(self, isolated_vocals):
92
+ # ffmpeg -i .\saiki.mkv -vn -ss 84 -to 1325 crop.wav
93
+ source_file = self.vocal_track if isolated_vocals and self.vocal_track else self.file
94
+ output = utils.get_output_path(source_file, "-crop.wav")
95
+ (
96
+ ffmpeg
97
+ .input(self.file, ss=self.start_time, to=self.end_time)
98
+ .output(output)
99
+ .global_args('-loglevel', 'error')
100
+ .global_args('-vn')
101
+ .run(overwrite_output=True)
102
+ )
103
+ return output
104
+
105
+ def filter_multilingual_subtiles(self, progress_hook=print, exclusion="English"):
106
+ multi_lingual_subs = []
107
+ removed_subs = []
108
+ # Speechbrain is being a lil bitch about this path on Windows all of the sudden
109
+ snippet_path = "video_snippet.wav" # utils.get_output_path('video_snippet', '.wav')
110
+ for i, sub in enumerate(self.subs_adjusted):
111
+ self.get_snippet(sub.start, sub.end).export(snippet_path, format="wav")
112
+ if sub.get_language(snippet_path) != exclusion:
113
+ multi_lingual_subs.append(sub)
114
+ else:
115
+ removed_subs.append(sub)
116
+ progress_hook(i, f"{i}/{len(self.subs_adjusted)}: {sub.text}")
117
+ self.subs_adjusted = multi_lingual_subs
118
+ self.subs_removed = removed_subs
119
+ progress_hook(-1, "done")
120
+
121
+ # This funxion is is used to only get the snippets of the audio that appear in subs_adjusted after language filtration or cropping, irregardless of the vocal splitting.
122
+ # This should be called AFTER filter multilingual and BEFORE vocal isolation. Not useful yet
123
+ # OKAY THERE HAS TO BE A FASTER WAY TO DO THIS X_X
124
+
125
+ # def isolate_subs(self):
126
+ # base = AudioSegment.silent(duration=self.duration*1000, frame_rate=self.audio.frame_rate, channels=self.audio.channels, frame_width=self.audio.frame_width)
127
+ # samples = np.array(base.get_array_of_samples())
128
+ # frame_rate = base.frame_rate
129
+
130
+ # for sub in self.subs_adjusted:
131
+ # copy = np.array(self.get_snippet(sub.start, sub.end).get_array_of_samples())
132
+ # start_sample = int(sub.start * frame_rate)
133
+ # end_sample = int(sub.end * frame_rate)
134
+
135
+ # # Ensure that the copy array has the same length as the region to replace
136
+ # copy = copy[:end_sample - start_sample] # Trim if necessary
137
+
138
+ # samples[start_sample:end_sample] = copy
139
+
140
+ # return AudioSegment(
141
+ # samples.tobytes(),
142
+ # frame_rate=frame_rate,
143
+ # sample_width=base.sample_width, # Adjust sample_width as needed (2 bytes for int16)
144
+ # channels=base.channels
145
+ # )
146
+
147
+ def isolate_subs(self, subs):
148
+ empty_audio = AudioSegment.silent(self.duration * 1000, frame_rate=self.audio.frame_rate)
149
+ empty_audio = self.audio
150
+ first_sub = subs[0]
151
+ empty_audio = empty_audio[0:first_sub.start].silent((first_sub.end-first_sub.start)*1000)
152
+ for i, sub in enumerate(subs[:-1]):
153
+ print(sub.text)
154
+ empty_audio = empty_audio[sub.end:subs[i+1].start].silent((subs[i+1].start-sub.end)*1000, frame_rate=empty_audio.frame_rate, channels=empty_audio.channels, sample_width=empty_audio.sample_width, frame_width=empty_audio.frame_width)
155
+
156
+ return empty_audio
157
+
158
+ def run_dubbing(self, progress_hook=None):
159
+ total_errors = 0
160
+ operation_start_time = time.process_time()
161
+ empty_audio = AudioSegment.silent(self.duration * 1000, frame_rate=22050)
162
+ status = ""
163
+ # with concurrent.futures.ThreadPoolExecutor(max_workers=100) as pool:
164
+ # tasks = [pool.submit(dub_task, sub, i) for i, sub in enumerate(subs_adjusted)]
165
+ # for future in concurrent.futures.as_completed(tasks):
166
+ # pass
167
+ for i, sub in enumerate(self.subs_adjusted):
168
+ status = f"{i}/{len(self.subs_adjusted)}"
169
+ progress_hook(i, f"{status}: {sub.text}")
170
+ try:
171
+ line = sub.dub_line_file(False)
172
+ empty_audio = empty_audio.overlay(line, sub.start*1000)
173
+ except Exception as e:
174
+ print(e)
175
+ total_errors += 1
176
+ self.dub_track = empty_audio.export(utils.get_output_path(self.file, '-dubtrack.wav'), format="wav").name
177
+ progress_hook(i+1, "Mixing New Audio")
178
+ self.mix_av(mixing_ratio=1)
179
+ progress_hook(-1)
180
+ print(f"TOTAL TIME TAKEN: {time.process_time() - operation_start_time}")
181
+ # print(total_errors)
182
+
183
+ # This runs an ffmpeg command to combine the audio, video, and subtitles with a specific ratio of how loud to make the dubtrack
184
+ def mix_av(self, mixing_ratio=1, dubtrack=None, output_path=None):
185
+ # i hate python, plz let me use self in func def
186
+ if not dubtrack: dubtrack = self.dub_track
187
+ if not output_path: output_path = utils.get_output_path(self.file, '-dubbed.mkv')
188
+
189
+ input_video = ffmpeg.input(self.file)
190
+ input_audio = input_video.audio
191
+ if self.background_track:
192
+ input_audio = ffmpeg.input(self.background_track)
193
+ input_dub = ffmpeg.input(dubtrack).audio
194
+
195
+ mixed_audio = ffmpeg.filter([input_audio, input_dub], 'amix', duration='first', weights=f"1 {mixing_ratio}")
196
+
197
+ output = (
198
+ # input_video['s']
199
+ ffmpeg.output(input_video['v'], mixed_audio, output_path, vcodec="copy", acodec="aac")
200
+ .global_args('-loglevel', 'error')
201
+ .global_args('-shortest')
202
+ )
203
+ ffmpeg.run(output, overwrite_output=True)
204
+
205
+ # Change the subs to either a file or a different stream from the video file
206
+ def change_subs(self, stream_index=-1):
207
+ if self.downloaded:
208
+ sub_path = list(self.yt_sub_streams.values())[stream_index][-1]['filepath']
209
+ self.subs = self.subs_adjusted = load_subs(utils.get_output_path(sub_path, '.srt'), sub_path)
210
+ else:
211
+ # ffmpeg -i output.mkv -map 0:s:1 frick.srt
212
+ sub_path = utils.get_output_path(self.file, '.srt')
213
+ ffmpeg.input(self.file).output(sub_path, map=f"0:s:{stream_index}").run(overwrite_output=True)
214
+ self.subs = self.subs_adjusted = load_subs(sub_path)
215
+
216
+ def change_audio(self, stream_index=-1):
217
+ audio_path = utils.get_output_path(self.file, f"-${stream_index}.wav")
218
+ ffmpeg.input(self.file).output(audio_path, map=f"0:a:{stream_index}").run(overwrite_output=True)
219
+ self.audio = AudioSegment.from_file(audio_path)
vocal_isolation.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from spleeter.separator import Separator
2
+ from spleeter.audio import adapter
3
+ from pydub import AudioSegment
4
+ import numpy as np
5
+ import utils
6
+
7
+ separator = None # Separator('spleeter:2stems')
8
+ # I don't have any clue on how to make this work yet, just ignore for now. Ideally we'd never have to serialize the audio to wav and then rea read it but alas, bad implementations of PCM will be the death of me
9
+ def seperate_ram(video):
10
+ audio_loader = adapter.AudioAdapter.default()
11
+ sample_rate = 44100
12
+ audio = video.audio
13
+ # arr = np.array(audio.get_array_of_samples(), dtype=np.float32).reshape((-1, audio.channels)) / (
14
+ # 1 << (8 * audio.sample_width - 1)), audio.frame_rate
15
+ arr = np.array(audio.get_array_of_samples())
16
+ audio, _ = audio_loader.load_waveform(arr)
17
+ # waveform, _ = audio_loader.load('/path/to/audio/file', sample_rate=sample_rate)
18
+
19
+ print("base audio\n", base_audio, "\n")
20
+ # Perform the separation :
21
+ # prediction = separator.separate(audio)
22
+
23
+ def seperate_file(video, isolate_subs=True):
24
+ global separator
25
+ if not separator:
26
+ separator = Separator('spleeter:2stems')
27
+ source_audio_path = utils.get_output_path(video.file, '-audio.wav')
28
+ isolated_path = utils.get_output_path(video.file, '-isolate.wav')
29
+ separator.separate_to_file(
30
+ (video.audio).export(source_audio_path, format="wav").name,
31
+ './output/',
32
+ filename_format='{filename}-{instrument}.{codec}'
33
+ )
34
+ # separator.separate_to_file(
35
+ # video.isolate_subs().export(source_audio_path, format="wav").name,
36
+ # './output/',
37
+ # filename_format='{filename}-{instrument}.{codec}'
38
+ # )
39
+ background_track = utils.get_output_path(source_audio_path, '-accompaniment.wav')
40
+ # If we removed primary langauge subs from a multilingual video, we'll need to add them back to the background.
41
+ if video.subs_removed:
42
+ background = AudioSegment.from_file(background_track)
43
+ for sub in video.subs_removed:
44
+ background = background.overlay(video.get_snippet(sub.start, sub.end), int(sub.start*1000))
45
+ background.export(background_track, format="wav")
46
+ video.background_track = background_track
47
+ video.vocal_track = utils.get_output_path(isolated_path, '-vocals.wav')
weeablind.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import wx
2
+ import wx.adv
3
+ from Voice import Voice
4
+ from pydub import AudioSegment
5
+ from pydub.playback import play
6
+ from tabs.ConfigureVoiceTab import ConfigureVoiceTab
7
+ from tabs.SubtitlesTab import SubtitlesTab
8
+ from tabs.ListStreams import ListStreamsTab
9
+ import threading
10
+ import utils
11
+ from video import Video
12
+ import app_state
13
+ from video import Video
14
+ import json
15
+
16
+ class GUI(wx.Panel):
17
+ def __init__(self, parent):
18
+ super().__init__(parent)
19
+
20
+ # Labels
21
+ lbl_title = wx.StaticText(self, label="WeeaBlind")
22
+ lbl_GPU = wx.StaticText(self, label=f"GPU Detected? {utils.gpu_detected}")
23
+ lbl_GPU.SetForegroundColour((0, 255, 0) if utils.gpu_detected else (255, 0, 0))
24
+ lbl_main_file = wx.StaticText(self, label="Choose a video file or link to a YouTube video:")
25
+ lbl_start_time = wx.StaticText(self, label="Start Time:")
26
+ lbl_end_time = wx.StaticText(self, label="End Time:")
27
+
28
+ # Controls
29
+ btn_choose_file = wx.Button(self, label="Choose File")
30
+ btn_choose_file.Bind(wx.EVT_BUTTON, self.open_file)
31
+
32
+ self.txt_main_file = wx.TextCtrl(self, style=wx.TE_PROCESS_ENTER, value=utils.test_video_name)
33
+ self.txt_main_file.Bind(wx.EVT_TEXT_ENTER, lambda event: self.load_video(self.txt_main_file.Value))
34
+
35
+ self.txt_start = wx.TextCtrl(self, style=wx.TE_PROCESS_ENTER, value=utils.seconds_to_timecode(0))
36
+ self.txt_end = wx.TextCtrl(self, style=wx.TE_PROCESS_ENTER, value=utils.seconds_to_timecode(0))
37
+ self.txt_start.Bind(wx.EVT_TEXT_ENTER, self.change_crop_time)
38
+ self.txt_end.Bind(wx.EVT_TEXT_ENTER, self.change_crop_time)
39
+
40
+ self.chk_match_volume = wx.CheckBox(self, label="Match Speaker Volume")
41
+ self.chk_match_volume.SetValue(True)
42
+
43
+ self.lb_voices = wx.ListBox(self, choices=[speaker.name for speaker in app_state.speakers])
44
+ self.lb_voices.Bind(wx.EVT_LISTBOX, self.on_voice_change)
45
+ self.lb_voices.Select(0)
46
+
47
+ tab_control = wx.Notebook(self)
48
+ self.tab_voice_config = ConfigureVoiceTab(tab_control, self)
49
+ tab_control.AddPage(self.tab_voice_config, "Configure Voices")
50
+ self.tab_subtitles = SubtitlesTab(tab_control, self)
51
+ tab_control.AddPage(self.tab_subtitles, "Subtitles")
52
+ self.streams_tab = ListStreamsTab(tab_control, self)
53
+ tab_control.AddPage(self.streams_tab, "Video Streams")
54
+ btn_run_dub = wx.Button(self, label="Run Dubbing!")
55
+ btn_run_dub.Bind(wx.EVT_BUTTON, self.run_dub)
56
+ sizer = wx.GridBagSizer(vgap=5, hgap=5)
57
+
58
+ sizer.Add(lbl_title, pos=(0, 0), span=(1, 2), flag=wx.CENTER | wx.ALL, border=5)
59
+ sizer.Add(lbl_GPU, pos=(0, 3), span=(1, 1), flag=wx.CENTER | wx.ALL, border=5)
60
+ sizer.Add(lbl_main_file, pos=(2, 0), span=(1, 2), flag=wx.LEFT | wx.TOP, border=5)
61
+ sizer.Add(self.txt_main_file, pos=(3, 0), span=(1, 2), flag=wx.EXPAND | wx.LEFT | wx.RIGHT | wx.BOTTOM, border=5)
62
+ sizer.Add(btn_choose_file, pos=(3, 2), span=(1, 1), flag=wx.ALIGN_RIGHT | wx.RIGHT | wx.BOTTOM, border=5)
63
+ sizer.Add(lbl_start_time, pos=(4, 0), flag=wx.LEFT | wx.TOP, border=5)
64
+ sizer.Add(self.txt_start, pos=(4, 1), flag= wx.TOP | wx.RIGHT, border=5)
65
+ sizer.Add(lbl_end_time, pos=(5, 0), flag=wx.LEFT | wx.TOP, border=5)
66
+ sizer.Add(self.txt_end, pos=(5, 1), flag= wx.TOP | wx.RIGHT, border=5)
67
+ sizer.Add(self.chk_match_volume, pos=(6, 0), span=(1, 2), flag=wx.LEFT | wx.TOP, border=5)
68
+ sizer.Add(self.lb_voices, pos=(7, 0), span=(1, 1), flag=wx.EXPAND | wx.LEFT | wx.TOP, border=5)
69
+ sizer.Add(tab_control, pos=(7, 1), span=(1, 3), flag=wx.EXPAND | wx.ALL, border=5)
70
+ sizer.Add(btn_run_dub, pos=(9, 2), span=(1, 1), flag=wx.ALIGN_RIGHT | wx.RIGHT | wx.BOTTOM, border=5)
71
+ sizer.AddGrowableCol(1)
72
+ self.tab_voice_config.update_voice_fields(None)
73
+
74
+ self.SetSizerAndFit(sizer)
75
+
76
+ def open_file(self, evenet):
77
+ dlg = wx.FileDialog(
78
+ frame, message="Choose a file",
79
+ wildcard="*.*",
80
+ style=wx.FD_OPEN | wx.FD_CHANGE_DIR
81
+ )
82
+ if dlg.ShowModal() == wx.ID_OK:
83
+ self.load_video(dlg.GetPath())
84
+ dlg.Destroy()
85
+
86
+ def load_video(self, video_path):
87
+ def update_ui():
88
+ self.txt_main_file.Value = app_state.video.file
89
+ self.txt_start.SetValue(utils.seconds_to_timecode(app_state.video.start_time))
90
+ self.txt_end.SetValue(utils.seconds_to_timecode(app_state.video.end_time))
91
+ self.tab_subtitles.create_entries()
92
+
93
+ def initialize_video(progress=True):
94
+ app_state.video = Video(video_path, update_progress if progress else print)
95
+ wx.CallAfter(update_ui)
96
+ wx.CallAfter(self.streams_tab.populate_streams, app_state.video.list_streams())
97
+
98
+ if video_path.startswith("http"):
99
+ dialog = wx.ProgressDialog("Downloading Video", "Download starting", 100, self)
100
+
101
+ def update_progress(progress=None):
102
+ status = progress['status'] if progress else "waiting"
103
+ total = progress.get("fragment_count", progress.get("total_bytes", 0))
104
+ if status == "downloading" and total:
105
+ completed = progress.get("fragment_index", progress.get("downloaded_bytes", 1))
106
+ percent_complete = int(100 * (completed / total))
107
+ wx.CallAfter(dialog.Update, percent_complete, f"{status}: {percent_complete}% \n {progress['info_dict'].get('fulltitle', '')}")
108
+ elif status == "complete":
109
+ if dialog:
110
+ wx.CallAfter(dialog.Destroy)
111
+ elif status == "error":
112
+ wx.CallAfter(wx.MessageBox,
113
+ f"Failed to download video with the following Error:\n {str(progress['error'])}",
114
+ "Error",
115
+ wx.ICON_ERROR
116
+ )
117
+ update_progress({"status": "complete"})
118
+
119
+ threading.Thread(target=initialize_video).start()
120
+ else:
121
+ initialize_video(False)
122
+
123
+ def change_crop_time(self, event):
124
+ app_state.video.update_time(
125
+ utils.timecode_to_seconds(self.txt_start.Value),
126
+ utils.timecode_to_seconds(self.txt_end.Value)
127
+ )
128
+ self.tab_subtitles.create_entries()
129
+
130
+ def update_voices_list(self):
131
+ self.lb_voices.Set([speaker.name for speaker in app_state.speakers])
132
+ self.lb_voices.Select(self.lb_voices.Strings.index(app_state.current_speaker.name))
133
+
134
+ def on_voice_change(self, event):
135
+ app_state.current_speaker = app_state.speakers[self.lb_voices.GetSelection()]
136
+ app_state.sample_speaker = app_state.current_speaker
137
+ self.tab_voice_config.update_voice_fields(event)
138
+
139
+ def run_dub(self, event):
140
+ progress_dialog = wx.ProgressDialog(
141
+ "Dubbing Progress",
142
+ "Starting...",
143
+ maximum=len(app_state.video.subs_adjusted) + 1, # +1 for combining phase
144
+ parent=self,
145
+ style=wx.PD_APP_MODAL | wx.PD_AUTO_HIDE
146
+ )
147
+ dub_thread = None
148
+ def update_progress(i, text=""):
149
+ if i == -1:
150
+ return wx.CallAfter(progress_dialog.Destroy)
151
+ wx.CallAfter(progress_dialog.Update, i, text)
152
+
153
+ dub_thread = threading.Thread(target=app_state.video.run_dubbing, args=(update_progress,))
154
+ dub_thread.start()
155
+
156
+ if __name__ == '__main__':
157
+ utils.create_output_dir()
158
+ app = wx.App(False)
159
+ frame = wx.Frame(None, wx.ID_ANY, utils.APP_NAME, size=(800, 800))
160
+ frame.Center()
161
+ gui = GUI(frame)
162
+ frame.Show()
163
+ app.MainLoop()