SilentSpeak
/

torchnet

Model card Files Files and versions Community

torchnet / README.md

milselarch's picture

Update README.md

5dcfae6 verified 9 months ago

|

history blame contribute delete

2.19 kB

	---
	license: gpl-3.0
	language:
	- en
	metrics:
	- wer
	---
	# LipNet Phonemes Predictors

	Project was developed on using python3.8, in a Linux Ubuntu 24.04
	run `python -m pip install -r requirements.txt` to make sure your dependencies are the same as mine
	the list of video files to be used for training and validation when training normal LipNet (not phonemes prediction)
	are in unseen_train.txt and unseen_test.txt respectively.
	the datasets are zipped in lip/*.zip, unzip them into the same location and run `python main.py` to start training
	hyperparamters are found in options.py

	Project Setup
	1. pull this repo using `git pull https://huggingface.co/SilentSpeak/torchnet phonemes`
	2. initialize a python virtualenv for this project using `python3.8 -m venv venv`
	3. initialize the virtualenv using `source venv/bin/activate`
	4. run `python -m pip install -r requirements.txt` to get dependencies
	5. install git LFS using `git lfs install`
	6. pull the GRID dataset and saved tensorboard runs using `git lfs pull`

	Following the project setup, you can run training as follows:
	To run training for the LipNet phonemes predictor, run `python main.py`
	To run training for the LipNet phonemes to text transformer predictor, run `python TransformerTrainer.py`
	To run training for the LipNet-to-BiGRU-to-text transformer predictor, run `python TranslatorTrainer.py`
	To run evaluation for the lipnet phonemes predictor + phonemes-to-text transformer end-to-end pipeline,
	run `cd tests && python lipnet-pipeline.py`. The model weights used in `lipnet-pipeline.py` are included in the repo as
	LFS files in the `saved-weights` folder.

	The LRS2 dataset was too large to include in the repo, and access to the LRS2 dataset is conditional on accepting
	the non-commercial usage license. However, the config file for training on the LRS2 dataset can be found in `options_lrs2.py`
	, and the preprocessing code for the LRS2 dataset can be found in `scripts/extract_crop_lips_v2.py` and `scripts/generate_lsr2_train.py`.
	The LRS2 dataset itself can be be found at [https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html](https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html)