Spaces:
Build error
Build error
ahmedghani
commited on
Commit
β’
af37d8b
1
Parent(s):
8235b4f
updated README.md
Browse files
README.md
CHANGED
@@ -1,123 +1,10 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
pip install -r requirements.txt
|
12 |
-
```
|
13 |
-
|
14 |
-
| Pretrained-Model | Dataset | Epochs | Train Loss | Valid Loss |
|
15 |
-
|:------------:|:------------:|:------------:|:------------:|:------------:
|
16 |
-
| [checkpoint.th](https://drive.google.com/drive/folders/1WzhvH1oIB9LqoTyItA6jViTRai5aURzJ?usp=sharing) | Librimix-7 (16k-mix_clean) | 31 | 0.04 | 0.64 |
|
17 |
-
|
18 |
-
This is an intermediate checkpoint just for demo purpose.
|
19 |
-
|
20 |
-
create directory ```outputs/exp_``` and save checkpoint there
|
21 |
-
```
|
22 |
-
svoice_demo
|
23 |
-
βββ outputs
|
24 |
-
β βββ exp_
|
25 |
-
β βββ checkpoint.th
|
26 |
-
...
|
27 |
-
```
|
28 |
-
|
29 |
-
## Running End To End project
|
30 |
-
#### Terminal 1
|
31 |
-
```bash
|
32 |
-
conda activate svoice
|
33 |
-
python demo.py
|
34 |
-
```
|
35 |
-
|
36 |
-
## Training
|
37 |
-
Create dataset ```mix_clean``` with sample rate ```16K``` using [librimix](https://github.com/shakeddovrat/librimix) repo.
|
38 |
-
|
39 |
-
Dataset Structure
|
40 |
-
```
|
41 |
-
svoice_demo
|
42 |
-
βββ Libri7Mix_Dataset
|
43 |
-
β βββ wav16k
|
44 |
-
β βββ min
|
45 |
-
β β βββ dev
|
46 |
-
β β βββ ...
|
47 |
-
β β βββ test
|
48 |
-
β β βββ ...
|
49 |
-
β β βββ train-360
|
50 |
-
β β βββ ...
|
51 |
-
...
|
52 |
-
```
|
53 |
-
|
54 |
-
#### Create ```metadata``` files
|
55 |
-
For Librimix7 dataset
|
56 |
-
```
|
57 |
-
bash create_metadata_librimix7.sh
|
58 |
-
```
|
59 |
-
|
60 |
-
For Librimix10 dataset
|
61 |
-
```
|
62 |
-
bash create_metadata_librimix10.sh
|
63 |
-
```
|
64 |
-
|
65 |
-
Change ```conf/config.yaml``` according to your settings. Set ```C: 10``` value at line 66 for number of speakers.
|
66 |
-
|
67 |
-
```
|
68 |
-
python train.py
|
69 |
-
```
|
70 |
-
This will automaticlly read all the configurations from the `conf/config.yaml` file.
|
71 |
-
To know more about the training you may refer to original [svoice](https://github.com/facebookresearch/svoice) repo.
|
72 |
-
|
73 |
-
#### Distributed Training
|
74 |
-
|
75 |
-
```
|
76 |
-
python train.py ddp=1
|
77 |
-
```
|
78 |
-
|
79 |
-
### Evaluating
|
80 |
-
|
81 |
-
```
|
82 |
-
python -m svoice.evaluate <path to the model> <path to folder containing mix.json and all target separated channels json files s<ID>.json>
|
83 |
-
```
|
84 |
-
|
85 |
-
### Citation
|
86 |
-
|
87 |
-
The svoice code is borrowed from original [svoice](https://github.com/facebookresearch/svoice) repository. All rights of code are reserved by [META Research](https://github.com/facebookresearch).
|
88 |
-
|
89 |
-
```
|
90 |
-
@inproceedings{nachmani2020voice,
|
91 |
-
title={Voice Separation with an Unknown Number of Multiple Speakers},
|
92 |
-
author={Nachmani, Eliya and Adi, Yossi and Wolf, Lior},
|
93 |
-
booktitle={Proceedings of the 37th international conference on Machine learning},
|
94 |
-
year={2020}
|
95 |
-
}
|
96 |
-
```
|
97 |
-
```
|
98 |
-
@misc{cosentino2020librimix,
|
99 |
-
title={LibriMix: An Open-Source Dataset for Generalizable Speech Separation},
|
100 |
-
author={Joris Cosentino and Manuel Pariente and Samuele Cornell and Antoine Deleforge and Emmanuel Vincent},
|
101 |
-
year={2020},
|
102 |
-
eprint={2005.11262},
|
103 |
-
archivePrefix={arXiv},
|
104 |
-
primaryClass={eess.AS}
|
105 |
-
}
|
106 |
-
```
|
107 |
-
## License
|
108 |
-
This repository is released under the CC-BY-NC-SA 4.0. license as found in the [LICENSE](LICENSE) file.
|
109 |
-
|
110 |
-
The file: `svoice/models/sisnr_loss.py` and `svoice/data/preprocess.py` were adapted from the [kaituoxu/Conv-TasNet][convtas] repository. It is an unofficial implementation of the [Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation][convtas-paper] paper, released under the MIT License.
|
111 |
-
Additionally, several input manipulation functions were borrowed and modified from the [yluo42/TAC][tac] repository, released under the CC BY-NC-SA 3.0 License.
|
112 |
-
|
113 |
-
[icml]: https://arxiv.org/abs/2003.01531.pdf
|
114 |
-
[icassp]: https://arxiv.org/pdf/2011.02329.pdf
|
115 |
-
[web]: https://enk100.github.io/speaker_separation/
|
116 |
-
[pytorch]: https://pytorch.org/
|
117 |
-
[hydra]: https://github.com/facebookresearch/hydra
|
118 |
-
[hydra-web]: https://hydra.cc/
|
119 |
-
[convtas]: https://github.com/kaituoxu/Conv-TasNet
|
120 |
-
[convtas-paper]: https://arxiv.org/pdf/1809.07454.pdf
|
121 |
-
[tac]: https://github.com/yluo42/TAC
|
122 |
-
[nprirgen]: https://github.com/ty274/rir-generator
|
123 |
-
[rir]:https://asa.scitation.org/doi/10.1121/1.382599
|
|
|
1 |
+
---
|
2 |
+
title: svoice_demo
|
3 |
+
emoji: π₯
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: green
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 3.11.0
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|