ahmedghani commited on
Commit
af37d8b
β€’
1 Parent(s): 8235b4f

updated README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -123
README.md CHANGED
@@ -1,123 +1,10 @@
1
- # Speaker Voice Separation using Neural Nets Gradio Demo
2
-
3
- ## Installation
4
-
5
- ```bash
6
- git clone https://github.com/Muhammad-Ahmad-Ghani/svoice_demo.git
7
- cd svoice_demo
8
- conda create -n svoice python=3.7 -y
9
- conda activate svoice
10
- conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
11
- pip install -r requirements.txt
12
- ```
13
-
14
- | Pretrained-Model | Dataset | Epochs | Train Loss | Valid Loss |
15
- |:------------:|:------------:|:------------:|:------------:|:------------:
16
- | [checkpoint.th](https://drive.google.com/drive/folders/1WzhvH1oIB9LqoTyItA6jViTRai5aURzJ?usp=sharing) | Librimix-7 (16k-mix_clean) | 31 | 0.04 | 0.64 |
17
-
18
- This is an intermediate checkpoint just for demo purpose.
19
-
20
- create directory ```outputs/exp_``` and save checkpoint there
21
- ```
22
- svoice_demo
23
- β”œβ”€β”€ outputs
24
- β”‚ └── exp_
25
- β”‚ └── checkpoint.th
26
- ...
27
- ```
28
-
29
- ## Running End To End project
30
- #### Terminal 1
31
- ```bash
32
- conda activate svoice
33
- python demo.py
34
- ```
35
-
36
- ## Training
37
- Create dataset ```mix_clean``` with sample rate ```16K``` using [librimix](https://github.com/shakeddovrat/librimix) repo.
38
-
39
- Dataset Structure
40
- ```
41
- svoice_demo
42
- β”œβ”€β”€ Libri7Mix_Dataset
43
- β”‚ └── wav16k
44
- β”‚ └── min
45
- β”‚ β”‚ └── dev
46
- β”‚ β”‚ └── ...
47
- β”‚ β”‚ └── test
48
- β”‚ β”‚ └── ...
49
- β”‚ β”‚ └── train-360
50
- β”‚ β”‚ └── ...
51
- ...
52
- ```
53
-
54
- #### Create ```metadata``` files
55
- For Librimix7 dataset
56
- ```
57
- bash create_metadata_librimix7.sh
58
- ```
59
-
60
- For Librimix10 dataset
61
- ```
62
- bash create_metadata_librimix10.sh
63
- ```
64
-
65
- Change ```conf/config.yaml``` according to your settings. Set ```C: 10``` value at line 66 for number of speakers.
66
-
67
- ```
68
- python train.py
69
- ```
70
- This will automaticlly read all the configurations from the `conf/config.yaml` file.
71
- To know more about the training you may refer to original [svoice](https://github.com/facebookresearch/svoice) repo.
72
-
73
- #### Distributed Training
74
-
75
- ```
76
- python train.py ddp=1
77
- ```
78
-
79
- ### Evaluating
80
-
81
- ```
82
- python -m svoice.evaluate <path to the model> <path to folder containing mix.json and all target separated channels json files s<ID>.json>
83
- ```
84
-
85
- ### Citation
86
-
87
- The svoice code is borrowed from original [svoice](https://github.com/facebookresearch/svoice) repository. All rights of code are reserved by [META Research](https://github.com/facebookresearch).
88
-
89
- ```
90
- @inproceedings{nachmani2020voice,
91
- title={Voice Separation with an Unknown Number of Multiple Speakers},
92
- author={Nachmani, Eliya and Adi, Yossi and Wolf, Lior},
93
- booktitle={Proceedings of the 37th international conference on Machine learning},
94
- year={2020}
95
- }
96
- ```
97
- ```
98
- @misc{cosentino2020librimix,
99
- title={LibriMix: An Open-Source Dataset for Generalizable Speech Separation},
100
- author={Joris Cosentino and Manuel Pariente and Samuele Cornell and Antoine Deleforge and Emmanuel Vincent},
101
- year={2020},
102
- eprint={2005.11262},
103
- archivePrefix={arXiv},
104
- primaryClass={eess.AS}
105
- }
106
- ```
107
- ## License
108
- This repository is released under the CC-BY-NC-SA 4.0. license as found in the [LICENSE](LICENSE) file.
109
-
110
- The file: `svoice/models/sisnr_loss.py` and `svoice/data/preprocess.py` were adapted from the [kaituoxu/Conv-TasNet][convtas] repository. It is an unofficial implementation of the [Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation][convtas-paper] paper, released under the MIT License.
111
- Additionally, several input manipulation functions were borrowed and modified from the [yluo42/TAC][tac] repository, released under the CC BY-NC-SA 3.0 License.
112
-
113
- [icml]: https://arxiv.org/abs/2003.01531.pdf
114
- [icassp]: https://arxiv.org/pdf/2011.02329.pdf
115
- [web]: https://enk100.github.io/speaker_separation/
116
- [pytorch]: https://pytorch.org/
117
- [hydra]: https://github.com/facebookresearch/hydra
118
- [hydra-web]: https://hydra.cc/
119
- [convtas]: https://github.com/kaituoxu/Conv-TasNet
120
- [convtas-paper]: https://arxiv.org/pdf/1809.07454.pdf
121
- [tac]: https://github.com/yluo42/TAC
122
- [nprirgen]: https://github.com/ty274/rir-generator
123
- [rir]:https://asa.scitation.org/doi/10.1121/1.382599
 
1
+ ---
2
+ title: svoice_demo
3
+ emoji: πŸ”₯
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 3.11.0
8
+ app_file: app.py
9
+ pinned: false
10
+ ---