|
--- |
|
license: gpl-3.0 |
|
language: |
|
- en |
|
metrics: |
|
- wer |
|
--- |
|
# LipNet Phonemes Predictors |
|
|
|
Project was developed on using python3.8, in a Linux Ubuntu 24.04 |
|
run `python -m pip install -r requirements.txt` to make sure your dependencies are the same as mine |
|
the list of video files to be used for training and validation when training normal LipNet (not phonemes prediction) |
|
are in unseen_train.txt and unseen_test.txt respectively. |
|
the datasets are zipped in lip/*.zip, unzip them into the same location and run `python main.py` to start training |
|
hyperparamters are found in options.py |
|
|
|
Project Setup |
|
1. pull this repo using `git pull https://huggingface.co/SilentSpeak/torchnet phonemes` |
|
2. initialize a python virtualenv for this project using `python3.8 -m venv venv` |
|
3. initialize the virtualenv using `source venv/bin/activate` |
|
4. run `python -m pip install -r requirements.txt` to get dependencies |
|
5. install git LFS using `git lfs install` |
|
6. pull the GRID dataset and saved tensorboard runs using `git lfs pull` |
|
|
|
Following the project setup, you can run training as follows: |
|
To run training for the LipNet phonemes predictor, run `python main.py` |
|
To run training for the LipNet phonemes to text transformer predictor, run `python TransformerTrainer.py` |
|
To run training for the LipNet-to-BiGRU-to-text transformer predictor, run `python TranslatorTrainer.py` |
|
To run evaluation for the lipnet phonemes predictor + phonemes-to-text transformer end-to-end pipeline, |
|
run `cd tests && python lipnet-pipeline.py`. The model weights used in `lipnet-pipeline.py` are included in the repo as |
|
LFS files in the `saved-weights` folder. |
|
|
|
The LRS2 dataset was too large to include in the repo, and access to the LRS2 dataset is conditional on accepting |
|
the non-commercial usage license. However, the config file for training on the LRS2 dataset can be found in `options_lrs2.py` |
|
, and the preprocessing code for the LRS2 dataset can be found in `scripts/extract_crop_lips_v2.py` and `scripts/generate_lsr2_train.py`. |
|
The LRS2 dataset itself can be be found at [https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html) |