reach-vb HF staff commited on
Commit
63954d9
1 Parent(s): 103d46f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # lina-speech (beta)
2
+
3
+ Exploring "linear attention" for text-to-speech.
4
+
5
+ It predicts audio codec "à la" [MusicGen](https://arxiv.org/abs/2306.05284) : delayed residual vector quantizers so that we do not need multiple models.
6
+
7
+ Featuring [RWKV](https://github.com/BlinkDL/RWKV-LM), [Mamba](https://github.com/state-spaces/mamba), [Gated Linear Attention](https://github.com/sustcsonglin/flash-linear-attention).
8
+
9
+ Compared to other LM TTS model :
10
+ - Can be easily pretrained and finetuned on midrange GPUs.
11
+ - Tiny memory footprint.
12
+ - Trained on long context (up to 2000 tokens : ~27s).
13
+
14
+ ### Models
15
+
16
+ | Model | #Params | Dataset | Checkpoint | Steps | Note |
17
+ | :---: | :---: |:---: |:---: |:---: |:---: |
18
+ | GLA | 60M, 130M | Librilight-medium | [Download](https://nubo.ircam.fr/index.php/s/wjNYLb54m7L8xf9) | 300k | GPU inference only |
19
+ | Mamba| 60M | Librilight-medium |[Download](https://nubo.ircam.fr/index.php/s/wjNYLb54m7L8xf9)| 300k | GPU inference only |
20
+ | RWKV v6 | 60M | LibriTTS |[Download](https://nubo.ircam.fr/index.php/s/wjNYLb54m7L8xf9) | 150k | GPU inference only |
21
+
22
+ ### Installation
23
+ Following the linear complexity LM you choose, follow respective instructions first:
24
+ - For Mamba check the [official repo](https://github.com/state-spaces/mamba).
25
+ - For GLA/RWKV inference check [flash-linear-attention](https://github.com/sustcsonglin/flash-linear-attention).
26
+ - For RWKV training check [RWKV-LM](https://github.com/BlinkDL/RWKV-LM)
27
+
28
+ ### Inference
29
+
30
+ Download configuration and weights above, then check `Inference.ipynb`.
31
+
32
+ ### TODO
33
+
34
+ - [x] Fix RWKV6 inference and/or switch to FLA implem.
35
+ - [ ] Provide a Datamodule for training (_lhotse_ might also work well).
36
+ - [ ] Implement CFG.
37
+ - [ ] Scale up.
38
+
39
+ ### Acknowledgment
40
+
41
+ - The RWKV authors and the community around for carrying high-level truly opensource research.
42
+ - @SmerkyG for making my life easy at testing cutting edge language model.
43
+ - @lucidrains for its huge codebase.
44
+ - @sustcsonglin who made [GLA and FLA](https://github.com/sustcsonglin/flash-linear-attention).
45
+ - @harrisonvanderbyl fixing RWKV inference.
46
+
47
+ ### Cite
48
+ ```bib
49
+ @software{lemerle2024linaspeech,
50
+ title = {LinaSpeech: Exploring "linear attention" for text-to-speech.},
51
+ author = {Lemerle, Théodor},
52
+ url = {https://github.com/theodorblackbird/lina-speech},
53
+ month = april,
54
+ year = {2024}
55
+ }
56
+ ```
57
+ ### IRCAM
58
+
59
+ This work takes place at IRCAM, and is part of the following project :
60
+ [ANR Exovoices](https://anr.fr/Projet-ANR-21-CE23-0040)
61
+
62
+ <img align="left" width="200" height="200" src="logo_ircam.jpeg">