added note about intention behind the model
Browse files
README.md
CHANGED
|
@@ -6,6 +6,7 @@ language:
|
|
| 6 |
- en
|
| 7 |
base_model:
|
| 8 |
- sesame/csm-1b
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
## csm-experssiva
|
|
@@ -69,4 +70,12 @@ audiofile.write("./audio.wav", np.asarray(audio), 24000)
|
|
| 69 |
|
| 70 |
The future plan is to implement KTO on `csm-mlx` and further mitigate model failure cases using that approach.
|
| 71 |
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
- en
|
| 7 |
base_model:
|
| 8 |
- sesame/csm-1b
|
| 9 |
+
pipeline_tag: text-to-audio
|
| 10 |
---
|
| 11 |
|
| 12 |
## csm-experssiva
|
|
|
|
| 70 |
|
| 71 |
The future plan is to implement KTO on `csm-mlx` and further mitigate model failure cases using that approach.
|
| 72 |
|
| 73 |
+
**Note**
|
| 74 |
+
|
| 75 |
+
This model was fine-tuned to investigate whether the CSM-1b model exhibits emergent capacity to effectively compress and reconstruct whisper-style vocal features - something that traditional TTS models do not usually demonstrate.
|
| 76 |
+
It also serves as a preliminary verification of the csm-mlx training setup and the correctness of its loss function.
|
| 77 |
+
I want to make it clear that I do **not endorse or encourage** any inappropriate use of this model. Any unintended associations or interpretations do not reflect the intent behind this model.
|
| 78 |
+
|
| 79 |
+
**License**
|
| 80 |
+
|
| 81 |
+
Licence follows Expresso dataset's `cc-by-nc-4.0`, since it's trained from it!
|