Fix typos (#1)
Browse files- Correct pipeline tag in model card (ff0ab65f3d1598bb6622b2a85c217ce175723d6e)
- Update README.md (e6504e164bd68545ee3d356761e5261e2c783f0b)
Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>
README.md
CHANGED
@@ -1,28 +1,36 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
datasets:
|
5 |
- openslr/librispeech_asr
|
6 |
- slprl/SpokenSwag
|
7 |
- slprl/sTinyStories
|
8 |
-
|
9 |
-
|
10 |
pipeline_tag: audio-to-audio
|
11 |
---
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
# Model Card for Model ID
|
14 |
-
This is a Speech
|
15 |
|
16 |
|
17 |
## Model Details
|
18 |
|
19 |
### Model Description
|
20 |
-
This
|
21 |
It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
|
22 |
the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz).
|
23 |
|
24 |
-
The model was trained
|
25 |
-
[sTinyStories](https://huggingface.co/datasets/slprl/sTinyStories).
|
26 |
[SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
27 |
|
28 |
- **Developed by:** [SLP-RL](https://huggingface.co/slprl)
|
@@ -34,10 +42,10 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib
|
|
34 |
|
35 |
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
36 |
- **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
|
37 |
-
- **Demo:** [
|
38 |
|
39 |
## Uses
|
40 |
-
This
|
41 |
[codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
|
42 |
|
43 |
### Out-of-Scope Use
|
@@ -46,7 +54,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo
|
|
46 |
|
47 |
|
48 |
## How to Get Started with the Model
|
49 |
-
We refer users to the official repository for full usage
|
50 |
|
51 |
|
52 |
## Training Details
|
@@ -60,7 +68,7 @@ This model was trained on a subset of [LibriSpeech](https://huggingface.co/datas
|
|
60 |
dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
61 |
|
62 |
### Training Procedure
|
63 |
-
This model was trained by next token prediction over several
|
64 |
Please refer to the [paper](https://arxiv.org/abs/2502.15814) or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
|
65 |
|
66 |
#### Preprocessing
|
@@ -105,7 +113,7 @@ This model was trained using **only 2 Nvidia A100 GPU** for **48 hours**.
|
|
105 |
|
106 |
#### Software
|
107 |
The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
|
108 |
-
easy and
|
109 |
|
110 |
## Citation
|
111 |
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- Qwen/Qwen2.5-0.5B
|
4 |
datasets:
|
5 |
- openslr/librispeech_asr
|
6 |
- slprl/SpokenSwag
|
7 |
- slprl/sTinyStories
|
8 |
+
library_name: transformers
|
9 |
+
license: mit
|
10 |
pipeline_tag: audio-to-audio
|
11 |
---
|
12 |
|
13 |
+
# Slamming: Training a Speech Language Model on One GPU in a Day
|
14 |
+
|
15 |
+
The model was presented in the paper [Slamming: Training a Speech Language Model on One GPU in a Day](https://arxiv.org/abs/2502.15814).
|
16 |
+
|
17 |
+
# Paper abstract
|
18 |
+
|
19 |
+
We introduce Slam, a recipe for training high-quality Speech Language Models (SLMs) on a single academic GPU in 24 hours. We do so through empirical analysis of model initialisation and architecture, synthetic training data, preference optimisation with synthetic data and tweaking all other components. We empirically demonstrate that this training recipe also scales well with more compute getting results on par with leading SLMs in a fraction of the compute cost. We hope these insights will make SLM training and research more accessible. In the context of SLM scaling laws, our results far outperform predicted compute optimal performance, giving an optimistic view to SLM feasibility. See code, data, models, samples at - https://pages.cs.huji.ac.il/adiyoss-lab/slamming .
|
20 |
+
|
21 |
# Model Card for Model ID
|
22 |
+
This is a Speech Language Model (SLM) trained for generating speech continuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
|
23 |
|
24 |
|
25 |
## Model Details
|
26 |
|
27 |
### Model Description
|
28 |
+
This Speech Language Model, introduced in ["_Slamming_: Training a Speech Language Model on One GPU in a Day"](https://arxiv.org/abs/2502.15814), focuses on efficient training.
|
29 |
It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from
|
30 |
the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz).
|
31 |
|
32 |
+
The model was pre-trained using next-token prediction on a subset of LibriSpeech, Libri-Light and a synthetic dataset
|
33 |
+
[sTinyStories](https://huggingface.co/datasets/slprl/sTinyStories). It was subsequently fine-tuned with DPO on
|
34 |
[SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
35 |
|
36 |
- **Developed by:** [SLP-RL](https://huggingface.co/slprl)
|
|
|
42 |
|
43 |
- **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
|
44 |
- **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
|
45 |
+
- **Demo:** [https://pages.cs.huji.ac.il/adiyoss-lab/slamming/](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
|
46 |
|
47 |
## Uses
|
48 |
+
This base SpeechLM can be used to generate continuations for speech segments, or as a base for further tuning. See the _SlamKit_
|
49 |
[codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
|
50 |
|
51 |
### Out-of-Scope Use
|
|
|
54 |
|
55 |
|
56 |
## How to Get Started with the Model
|
57 |
+
We refer users to the official repository for full usage explanations - [github](https://github.com/slp-rl/slamkit).
|
58 |
|
59 |
|
60 |
## Training Details
|
|
|
68 |
dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
69 |
|
70 |
### Training Procedure
|
71 |
+
This model was trained by next token prediction over several datasets, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
|
72 |
Please refer to the [paper](https://arxiv.org/abs/2502.15814) or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
|
73 |
|
74 |
#### Preprocessing
|
|
|
113 |
|
114 |
#### Software
|
115 |
The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
|
116 |
+
easy and efficient training of Speech Language Models.
|
117 |
|
118 |
## Citation
|
119 |
|