MoritzLaurer HF staff ylacombe commited on
Commit
fef91df
·
verified ·
0 Parent(s):

Duplicate from parler-tts/parler-tts-mini-v1.1

Browse files

Co-authored-by: Yoach Lacombe <ylacombe@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - text-to-speech
5
+ - annotation
6
+ license: apache-2.0
7
+ language:
8
+ - en
9
+ pipeline_tag: text-to-speech
10
+ inference: false
11
+ datasets:
12
+ - parler-tts/mls_eng
13
+ - parler-tts/libritts_r_filtered
14
+ - parler-tts/libritts-r-filtered-speaker-descriptions
15
+ - parler-tts/mls-eng-speaker-descriptions
16
+ ---
17
+
18
+ <img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
19
+
20
+
21
+ # Parler-TTS Mini v1.1
22
+
23
+ <a target="_blank" href="https://huggingface.co/spaces/parler-tts/parler_tts">
24
+ <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
25
+ </a>
26
+
27
+ **Parler-TTS Mini v1.1** is a lightweight text-to-speech (TTS) model, trained on 45K hours of audio data, that can generate high-quality, natural sounding speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).
28
+
29
+ 🚨 **Parler-TTS Mini v1.1** is the exact same model than [Mini v1](https://huggingface.co/parler-tts/parler-tts-mini-v1). It was trained on the same datasets and with the same training configuration. The only **change** is the use of a **better prompt tokenizer**. This tokenizer has a larger vocabulary and handles byte fallback, which simplifies multilingual training. It's based on [unsloth/llama-2-7b](https://huggingface.co/unsloth/llama-2-7b) tokenizer. Thanks to the **[AI4Bharat](https://ai4bharat.iitm.ac.in/) team** who provided advice and assistance in improving tokenization. 🚨
30
+
31
+
32
+ ## 📖 Quick Index
33
+ * [👨‍💻 Installation](#👨‍💻-installation)
34
+ * [🎲 Using a random voice](#🎲-random-voice)
35
+ * [🎯 Using a specific speaker](#🎯-using-a-specific-speaker)
36
+ * [Motivation](#motivation)
37
+ * [Optimizing inference](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md)
38
+
39
+ ## 🛠️ Usage
40
+
41
+ 🚨Unlike previous versions of Parler-TTS, here we use two tokenizers - one for the prompt and one for the description.🚨
42
+
43
+ ### 👨‍💻 Installation
44
+
45
+ Using Parler-TTS is as simple as "bonjour". Simply install the library once:
46
+
47
+ ```sh
48
+ pip install git+https://github.com/huggingface/parler-tts.git
49
+ ```
50
+
51
+ ### 🎲 Random voice
52
+
53
+
54
+ **Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
55
+
56
+ ```py
57
+ import torch
58
+ from parler_tts import ParlerTTSForConditionalGeneration
59
+ from transformers import AutoTokenizer
60
+ import soundfile as sf
61
+
62
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
63
+
64
+ model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1.1").to(device)
65
+ tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1.1")
66
+ description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)
67
+
68
+ prompt = "Hey, how are you doing today?"
69
+ description = "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."
70
+
71
+ input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
72
+ prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
73
+
74
+ generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
75
+ audio_arr = generation.cpu().numpy().squeeze()
76
+ sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
77
+ ```
78
+
79
+ ### 🎯 Using a specific speaker
80
+
81
+ To ensure speaker consistency across generations, this checkpoint was also trained on 34 speakers, characterized by name (e.g. Jon, Lea, Gary, Jenna, Mike, Laura).
82
+
83
+ To take advantage of this, simply adapt your text description to specify which speaker to use: `Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise.`
84
+
85
+ ```py
86
+ import torch
87
+ from parler_tts import ParlerTTSForConditionalGeneration
88
+ from transformers import AutoTokenizer
89
+ import soundfile as sf
90
+
91
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
92
+
93
+ model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1.1").to(device)
94
+ tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1.1")
95
+ description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)
96
+
97
+ prompt = "Hey, how are you doing today?"
98
+ description = "Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
99
+
100
+ input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
101
+ prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
102
+
103
+ generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
104
+ audio_arr = generation.cpu().numpy().squeeze()
105
+ sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
106
+ ```
107
+
108
+ **Tips**:
109
+ * We've set up an [inference guide](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md) to make generation faster. Think SDPA, torch.compile, batching and streaming!
110
+ * Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
111
+ * Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech
112
+ * The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt
113
+
114
+ ## Motivation
115
+
116
+ Parler-TTS is a reproduction of work from the paper [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://www.text-description-to-speech.com) by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.
117
+
118
+ Contrarily to other TTS models, Parler-TTS is a **fully open-source** release. All of the datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models.
119
+ Parler-TTS was released alongside:
120
+ * [The Parler-TTS repository](https://github.com/huggingface/parler-tts) - you can train and fine-tuned your own version of the model.
121
+ * [The Data-Speech repository](https://github.com/huggingface/dataspeech) - a suite of utility scripts designed to annotate speech datasets.
122
+ * [The Parler-TTS organization](https://huggingface.co/parler-tts) - where you can find the annotated datasets as well as the future checkpoints.
123
+
124
+ ## Citation
125
+
126
+ If you found this repository useful, please consider citing this work and also the original Stability AI paper:
127
+
128
+ ```
129
+ @misc{lacombe-etal-2024-parler-tts,
130
+ author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
131
+ title = {Parler-TTS},
132
+ year = {2024},
133
+ publisher = {GitHub},
134
+ journal = {GitHub repository},
135
+ howpublished = {\url{https://github.com/huggingface/parler-tts}}
136
+ }
137
+ ```
138
+
139
+ ```
140
+ @misc{lyth2024natural,
141
+ title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
142
+ author={Dan Lyth and Simon King},
143
+ year={2024},
144
+ eprint={2402.01912},
145
+ archivePrefix={arXiv},
146
+ primaryClass={cs.SD}
147
+ }
148
+ ```
149
+
150
+ ## License
151
+
152
+ This model is permissively licensed under the Apache 2.0 license.
config.json ADDED
@@ -0,0 +1,273 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/fsx/yoach/tmp/artefacts/training-mini-v2-larger-vocab/checkpoint-140000-epoch-9/",
3
+ "architectures": [
4
+ "ParlerTTSForConditionalGeneration"
5
+ ],
6
+ "audio_encoder": {
7
+ "_name_or_path": "ylacombe/dac_44khz",
8
+ "add_cross_attention": false,
9
+ "architectures": [
10
+ "DacModel"
11
+ ],
12
+ "bad_words_ids": null,
13
+ "begin_suppress_tokens": null,
14
+ "bos_token_id": null,
15
+ "chunk_size_feed_forward": 0,
16
+ "codebook_dim": 8,
17
+ "codebook_loss_weight": 1.0,
18
+ "codebook_size": 1024,
19
+ "commitment_loss_weight": 0.25,
20
+ "cross_attention_hidden_size": null,
21
+ "decoder_hidden_size": 1536,
22
+ "decoder_start_token_id": null,
23
+ "diversity_penalty": 0.0,
24
+ "do_sample": false,
25
+ "downsampling_ratios": [
26
+ 2,
27
+ 4,
28
+ 8,
29
+ 8
30
+ ],
31
+ "early_stopping": false,
32
+ "encoder_hidden_size": 64,
33
+ "encoder_no_repeat_ngram_size": 0,
34
+ "eos_token_id": null,
35
+ "exponential_decay_length_penalty": null,
36
+ "finetuning_task": null,
37
+ "forced_bos_token_id": null,
38
+ "forced_eos_token_id": null,
39
+ "hidden_size": 1024,
40
+ "hop_length": 512,
41
+ "id2label": {
42
+ "0": "LABEL_0",
43
+ "1": "LABEL_1"
44
+ },
45
+ "is_decoder": false,
46
+ "is_encoder_decoder": false,
47
+ "label2id": {
48
+ "LABEL_0": 0,
49
+ "LABEL_1": 1
50
+ },
51
+ "length_penalty": 1.0,
52
+ "max_length": 20,
53
+ "min_length": 0,
54
+ "model_type": "dac",
55
+ "n_codebooks": 9,
56
+ "no_repeat_ngram_size": 0,
57
+ "num_beam_groups": 1,
58
+ "num_beams": 1,
59
+ "num_return_sequences": 1,
60
+ "output_attentions": false,
61
+ "output_hidden_states": false,
62
+ "output_scores": false,
63
+ "pad_token_id": null,
64
+ "prefix": null,
65
+ "problem_type": null,
66
+ "pruned_heads": {},
67
+ "quantizer_dropout": 0.0,
68
+ "remove_invalid_values": false,
69
+ "repetition_penalty": 1.0,
70
+ "return_dict": true,
71
+ "return_dict_in_generate": false,
72
+ "sampling_rate": 44100,
73
+ "sep_token_id": null,
74
+ "suppress_tokens": null,
75
+ "task_specific_params": null,
76
+ "temperature": 1.0,
77
+ "tf_legacy_loss": false,
78
+ "tie_encoder_decoder": false,
79
+ "tie_word_embeddings": true,
80
+ "tokenizer_class": null,
81
+ "top_k": 50,
82
+ "top_p": 1.0,
83
+ "torch_dtype": "float32",
84
+ "torchscript": false,
85
+ "typical_p": 1.0,
86
+ "upsampling_ratios": [
87
+ 8,
88
+ 8,
89
+ 4,
90
+ 2
91
+ ],
92
+ "use_bfloat16": false
93
+ },
94
+ "decoder": {
95
+ "_name_or_path": "/fsx/yoach/tmp/artefacts/parler-tts-mini-v2-empty/decoder",
96
+ "activation_dropout": 0.0,
97
+ "activation_function": "gelu",
98
+ "add_cross_attention": true,
99
+ "architectures": [
100
+ "ParlerTTSForCausalLM"
101
+ ],
102
+ "attention_dropout": 0.0,
103
+ "bad_words_ids": null,
104
+ "begin_suppress_tokens": null,
105
+ "bos_token_id": 1025,
106
+ "chunk_size_feed_forward": 0,
107
+ "codebook_weights": null,
108
+ "cross_attention_hidden_size": null,
109
+ "cross_attention_implementation_strategy": null,
110
+ "decoder_start_token_id": null,
111
+ "diversity_penalty": 0.0,
112
+ "do_sample": false,
113
+ "dropout": 0.1,
114
+ "early_stopping": false,
115
+ "encoder_no_repeat_ngram_size": 0,
116
+ "eos_token_id": 1024,
117
+ "exponential_decay_length_penalty": null,
118
+ "ffn_dim": 4096,
119
+ "finetuning_task": null,
120
+ "forced_bos_token_id": null,
121
+ "forced_eos_token_id": null,
122
+ "hidden_size": 1024,
123
+ "id2label": {
124
+ "0": "LABEL_0",
125
+ "1": "LABEL_1"
126
+ },
127
+ "initializer_factor": 0.02,
128
+ "is_decoder": true,
129
+ "is_encoder_decoder": false,
130
+ "label2id": {
131
+ "LABEL_0": 0,
132
+ "LABEL_1": 1
133
+ },
134
+ "layerdrop": 0.0,
135
+ "length_penalty": 1.0,
136
+ "max_length": 20,
137
+ "max_position_embeddings": 4096,
138
+ "min_length": 0,
139
+ "model_type": "parler_tts_decoder",
140
+ "no_repeat_ngram_size": 0,
141
+ "num_attention_heads": 16,
142
+ "num_beam_groups": 1,
143
+ "num_beams": 1,
144
+ "num_codebooks": 9,
145
+ "num_cross_attention_key_value_heads": 16,
146
+ "num_hidden_layers": 24,
147
+ "num_key_value_heads": 16,
148
+ "num_return_sequences": 1,
149
+ "output_attentions": false,
150
+ "output_hidden_states": false,
151
+ "output_scores": false,
152
+ "pad_token_id": 1024,
153
+ "prefix": null,
154
+ "problem_type": null,
155
+ "pruned_heads": {},
156
+ "remove_invalid_values": false,
157
+ "repetition_penalty": 1.0,
158
+ "return_dict": true,
159
+ "return_dict_in_generate": false,
160
+ "rope_embeddings": false,
161
+ "rope_theta": 10000.0,
162
+ "scale_embedding": false,
163
+ "sep_token_id": null,
164
+ "suppress_tokens": null,
165
+ "task_specific_params": null,
166
+ "temperature": 1.0,
167
+ "tf_legacy_loss": false,
168
+ "tie_encoder_decoder": false,
169
+ "tie_word_embeddings": false,
170
+ "tokenizer_class": null,
171
+ "top_k": 50,
172
+ "top_p": 1.0,
173
+ "torch_dtype": "float32",
174
+ "torchscript": false,
175
+ "typical_p": 1.0,
176
+ "use_bfloat16": false,
177
+ "use_cache": true,
178
+ "use_fused_lm_heads": true,
179
+ "vocab_size": 1088
180
+ },
181
+ "decoder_start_token_id": 1025,
182
+ "is_encoder_decoder": true,
183
+ "model_type": "parler_tts",
184
+ "pad_token_id": 1024,
185
+ "prompt_cross_attention": false,
186
+ "text_encoder": {
187
+ "_name_or_path": "google/flan-t5-large",
188
+ "add_cross_attention": false,
189
+ "architectures": [
190
+ "T5ForConditionalGeneration"
191
+ ],
192
+ "bad_words_ids": null,
193
+ "begin_suppress_tokens": null,
194
+ "bos_token_id": null,
195
+ "chunk_size_feed_forward": 0,
196
+ "classifier_dropout": 0.0,
197
+ "cross_attention_hidden_size": null,
198
+ "d_ff": 2816,
199
+ "d_kv": 64,
200
+ "d_model": 1024,
201
+ "decoder_start_token_id": 0,
202
+ "dense_act_fn": "gelu_new",
203
+ "diversity_penalty": 0.0,
204
+ "do_sample": false,
205
+ "dropout_rate": 0.1,
206
+ "early_stopping": false,
207
+ "encoder_no_repeat_ngram_size": 0,
208
+ "eos_token_id": 1,
209
+ "exponential_decay_length_penalty": null,
210
+ "feed_forward_proj": "gated-gelu",
211
+ "finetuning_task": null,
212
+ "forced_bos_token_id": null,
213
+ "forced_eos_token_id": null,
214
+ "id2label": {
215
+ "0": "LABEL_0",
216
+ "1": "LABEL_1"
217
+ },
218
+ "initializer_factor": 1.0,
219
+ "is_decoder": false,
220
+ "is_encoder_decoder": true,
221
+ "is_gated_act": true,
222
+ "label2id": {
223
+ "LABEL_0": 0,
224
+ "LABEL_1": 1
225
+ },
226
+ "layer_norm_epsilon": 1e-06,
227
+ "length_penalty": 1.0,
228
+ "max_length": 20,
229
+ "min_length": 0,
230
+ "model_type": "t5",
231
+ "n_positions": 512,
232
+ "no_repeat_ngram_size": 0,
233
+ "num_beam_groups": 1,
234
+ "num_beams": 1,
235
+ "num_decoder_layers": 24,
236
+ "num_heads": 16,
237
+ "num_layers": 24,
238
+ "num_return_sequences": 1,
239
+ "output_attentions": false,
240
+ "output_hidden_states": false,
241
+ "output_past": true,
242
+ "output_scores": false,
243
+ "pad_token_id": 0,
244
+ "prefix": null,
245
+ "problem_type": null,
246
+ "pruned_heads": {},
247
+ "relative_attention_max_distance": 128,
248
+ "relative_attention_num_buckets": 32,
249
+ "remove_invalid_values": false,
250
+ "repetition_penalty": 1.0,
251
+ "return_dict": true,
252
+ "return_dict_in_generate": false,
253
+ "sep_token_id": null,
254
+ "suppress_tokens": null,
255
+ "task_specific_params": null,
256
+ "temperature": 1.0,
257
+ "tf_legacy_loss": false,
258
+ "tie_encoder_decoder": false,
259
+ "tie_word_embeddings": false,
260
+ "tokenizer_class": null,
261
+ "top_k": 50,
262
+ "top_p": 1.0,
263
+ "torch_dtype": null,
264
+ "torchscript": false,
265
+ "typical_p": 1.0,
266
+ "use_bfloat16": false,
267
+ "use_cache": true,
268
+ "vocab_size": 32128
269
+ },
270
+ "torch_dtype": "float32",
271
+ "transformers_version": "4.46.0.dev0",
272
+ "vocab_size": 90714
273
+ }
generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1025,
4
+ "decoder_start_token_id": 1025,
5
+ "do_sample": true,
6
+ "eos_token_id": 1024,
7
+ "max_length": 2610,
8
+ "pad_token_id": 1024,
9
+ "transformers_version": "4.46.0.dev0"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f85ed0a4953b28f0bd9d3cec9f0e035df2936ba97646f315f54b42bf6ba6d0f9
3
+ size 3751321772
preprocessor_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "feature_extractor_type": "DacFeatureExtractor",
3
+ "feature_size": 1,
4
+ "hop_length": 512,
5
+ "padding_side": "right",
6
+ "padding_value": 0.0,
7
+ "return_attention_mask": true,
8
+ "sampling_rate": 44100
9
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc8fa773221597d09cfadb23a2b1bd717488a0481505469ea56d42cb044de9b5
3
+ size 1795391
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "legacy": false,
35
+ "model_max_length": 1000000000000000019884624838656,
36
+ "pad_token": "<unk>",
37
+ "padding_side": "left",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }