mechanicalsea commited on
Commit
9c8573e
1 Parent(s): 01ef6d8

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -1,3 +1,55 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - s3prl/mini_voxceleb1
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ library_name: fairseq
10
+ pipeline_tag: audio-classification
11
+ tags:
12
+ - speech
13
+ - text
14
+ - cross-modal
15
+ - unified model
16
+ - self-supervised learning
17
+ - SpeechT5
18
+ - Speaker Identification
19
+ - Speaker Recognition
20
  ---
21
+
22
+ ## SpeechT5 SID Manifest
23
+
24
+ | [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
25
+
26
+ This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [CMU ARCTIC](http://www.festvox.org/cmu_arctic/) four speakers, e.g., bdl, clb, rms, slt. There are 932 utterances for training, 100 utterances for validation, and 100 utterance for evaluation.
27
+
28
+ ### Requirements
29
+
30
+ - [Fairseq](https://github.com/facebookresearch/fairseq)
31
+
32
+ ### Tools
33
+
34
+ - `manifest/utils` is used to produce manifest as well as conduct training, validation, and evaluation.
35
+ - `mainfest/iden_split.txt` and `mainfest/vox1_meta.csv` are officially released files.
36
+
37
+ ### Model and Results
38
+
39
+ - [`speecht5_sid.pt`](.) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
40
+ - `results` are reproduced by the released fine-tuned model.
41
+
42
+ ### Reference
43
+
44
+ If you find our work is useful in your research, please cite the following paper:
45
+
46
+ ```bibtex
47
+ @inproceedings{ao-etal-2022-speecht5,
48
+ title = {{S}peech{T}5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing},
49
+ author = {Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu},
50
+ booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
51
+ month = {May},
52
+ year = {2022},
53
+ pages={5723--5738},
54
+ }
55
+ ```