facebook
/

magnet-small-10secs

Text-to-Audio

Audiocraft

magnet

Model card Files Files and versions Community

alonzi commited on Jan 11, 2024

Commit

a57a242

verified ·

1 Parent(s): e626966

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -18,8 +18,8 @@ widget:
 # MAGNeT - Small - 300M - 10secs
 MAGNeT is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions.
-It is a single stage, non-autoregressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
-Unlike prior work, MAGNeT doesn't require neither self-supervised semantic representation nor generative model cascading, and it generates all 4 codebooks using a single non-autoregressive transformer.
 MAGNeT was published in [Masked Audio Generation using a Single Non-Autoregressive Transformer](https://arxiv.org/abs/2401.04577) by *Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi*.
@@ -146,7 +146,7 @@ More information can be found in the paper [Masked Audio Generation using a Sing
 ## Limitations and biases
-**Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 15K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
 **Mitigations:** Tracks that include vocals have been removed from the data source using corresponding tags, and using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
@@ -171,7 +171,6 @@ More information can be found in the paper [Masked Audio Generation using a Sing
 The audio-magnet models were trained on the following data sources: a subset of AudioSet (Gemmeke et al., 2017), [BBC sound effects](https://sound-effects.bbcrewind.co.uk/), AudioCaps (Kim et al., 2019), Clotho v2 (Drossos et al., 2020), VGG-Sound (Chen et al., 2020), FSD50K (Fonseca et al., 2021), [Free To Use Sounds](https://www.freetousesounds.com/all-in-one-bundle/), [Sonniss Game Effects](https://sonniss.com/gameaudiogdc), [WeSoundEffects](https://wesoundeffects.com/we-sound-effects-bundle-2020/), [Paramount Motion - Odeon Cinematic Sound Effects](https://www.paramountmotion.com/odeon-sound-effects).
 ### Evaluation datasets
 The audio-magnet models (sound effect generation) were evaluated on the [AudioCaps benchmark](https://audiocaps.github.io/).

 # MAGNeT - Small - 300M - 10secs
 MAGNeT is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions.
+It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
+Unlike prior work, MAGNeT doesn't require neither semantic token conditioning nor model cascading, and it generates all 4 codebooks using a single non-autoregressive Transformer.
 MAGNeT was published in [Masked Audio Generation using a Single Non-Autoregressive Transformer](https://arxiv.org/abs/2401.04577) by *Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi*.
 ## Limitations and biases
+**Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 16K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
 **Mitigations:** Tracks that include vocals have been removed from the data source using corresponding tags, and using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
 The audio-magnet models were trained on the following data sources: a subset of AudioSet (Gemmeke et al., 2017), [BBC sound effects](https://sound-effects.bbcrewind.co.uk/), AudioCaps (Kim et al., 2019), Clotho v2 (Drossos et al., 2020), VGG-Sound (Chen et al., 2020), FSD50K (Fonseca et al., 2021), [Free To Use Sounds](https://www.freetousesounds.com/all-in-one-bundle/), [Sonniss Game Effects](https://sonniss.com/gameaudiogdc), [WeSoundEffects](https://wesoundeffects.com/we-sound-effects-bundle-2020/), [Paramount Motion - Odeon Cinematic Sound Effects](https://www.paramountmotion.com/odeon-sound-effects).
 ### Evaluation datasets
 The audio-magnet models (sound effect generation) were evaluated on the [AudioCaps benchmark](https://audiocaps.github.io/).