Update README.md
Browse files
README.md
CHANGED
@@ -18,8 +18,8 @@ widget:
|
|
18 |
# MAGNeT - Small - 300M - 10secs
|
19 |
|
20 |
MAGNeT is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions.
|
21 |
-
It is a
|
22 |
-
Unlike prior work, MAGNeT doesn't require neither
|
23 |
|
24 |
MAGNeT was published in [Masked Audio Generation using a Single Non-Autoregressive Transformer](https://arxiv.org/abs/2401.04577) by *Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi*.
|
25 |
|
@@ -146,7 +146,7 @@ More information can be found in the paper [Masked Audio Generation using a Sing
|
|
146 |
|
147 |
## Limitations and biases
|
148 |
|
149 |
-
**Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on
|
150 |
|
151 |
**Mitigations:** Tracks that include vocals have been removed from the data source using corresponding tags, and using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
|
152 |
|
@@ -171,7 +171,6 @@ More information can be found in the paper [Masked Audio Generation using a Sing
|
|
171 |
|
172 |
The audio-magnet models were trained on the following data sources: a subset of AudioSet (Gemmeke et al., 2017), [BBC sound effects](https://sound-effects.bbcrewind.co.uk/), AudioCaps (Kim et al., 2019), Clotho v2 (Drossos et al., 2020), VGG-Sound (Chen et al., 2020), FSD50K (Fonseca et al., 2021), [Free To Use Sounds](https://www.freetousesounds.com/all-in-one-bundle/), [Sonniss Game Effects](https://sonniss.com/gameaudiogdc), [WeSoundEffects](https://wesoundeffects.com/we-sound-effects-bundle-2020/), [Paramount Motion - Odeon Cinematic Sound Effects](https://www.paramountmotion.com/odeon-sound-effects).
|
173 |
|
174 |
-
|
175 |
### Evaluation datasets
|
176 |
|
177 |
The audio-magnet models (sound effect generation) were evaluated on the [AudioCaps benchmark](https://audiocaps.github.io/).
|
|
|
18 |
# MAGNeT - Small - 300M - 10secs
|
19 |
|
20 |
MAGNeT is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions.
|
21 |
+
It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
|
22 |
+
Unlike prior work, MAGNeT doesn't require neither semantic token conditioning nor model cascading, and it generates all 4 codebooks using a single non-autoregressive Transformer.
|
23 |
|
24 |
MAGNeT was published in [Masked Audio Generation using a Single Non-Autoregressive Transformer](https://arxiv.org/abs/2401.04577) by *Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi*.
|
25 |
|
|
|
146 |
|
147 |
## Limitations and biases
|
148 |
|
149 |
+
**Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 16K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
|
150 |
|
151 |
**Mitigations:** Tracks that include vocals have been removed from the data source using corresponding tags, and using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
|
152 |
|
|
|
171 |
|
172 |
The audio-magnet models were trained on the following data sources: a subset of AudioSet (Gemmeke et al., 2017), [BBC sound effects](https://sound-effects.bbcrewind.co.uk/), AudioCaps (Kim et al., 2019), Clotho v2 (Drossos et al., 2020), VGG-Sound (Chen et al., 2020), FSD50K (Fonseca et al., 2021), [Free To Use Sounds](https://www.freetousesounds.com/all-in-one-bundle/), [Sonniss Game Effects](https://sonniss.com/gameaudiogdc), [WeSoundEffects](https://wesoundeffects.com/we-sound-effects-bundle-2020/), [Paramount Motion - Odeon Cinematic Sound Effects](https://www.paramountmotion.com/odeon-sound-effects).
|
173 |
|
|
|
174 |
### Evaluation datasets
|
175 |
|
176 |
The audio-magnet models (sound effect generation) were evaluated on the [AudioCaps benchmark](https://audiocaps.github.io/).
|