alonzi commited on
Commit
a57a242
1 Parent(s): e626966

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -18,8 +18,8 @@ widget:
18
  # MAGNeT - Small - 300M - 10secs
19
 
20
  MAGNeT is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions.
21
- It is a single stage, non-autoregressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
22
- Unlike prior work, MAGNeT doesn't require neither self-supervised semantic representation nor generative model cascading, and it generates all 4 codebooks using a single non-autoregressive transformer.
23
 
24
  MAGNeT was published in [Masked Audio Generation using a Single Non-Autoregressive Transformer](https://arxiv.org/abs/2401.04577) by *Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi*.
25
 
@@ -146,7 +146,7 @@ More information can be found in the paper [Masked Audio Generation using a Sing
146
 
147
  ## Limitations and biases
148
 
149
- **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 15K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
150
 
151
  **Mitigations:** Tracks that include vocals have been removed from the data source using corresponding tags, and using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
152
 
@@ -171,7 +171,6 @@ More information can be found in the paper [Masked Audio Generation using a Sing
171
 
172
  The audio-magnet models were trained on the following data sources: a subset of AudioSet (Gemmeke et al., 2017), [BBC sound effects](https://sound-effects.bbcrewind.co.uk/), AudioCaps (Kim et al., 2019), Clotho v2 (Drossos et al., 2020), VGG-Sound (Chen et al., 2020), FSD50K (Fonseca et al., 2021), [Free To Use Sounds](https://www.freetousesounds.com/all-in-one-bundle/), [Sonniss Game Effects](https://sonniss.com/gameaudiogdc), [WeSoundEffects](https://wesoundeffects.com/we-sound-effects-bundle-2020/), [Paramount Motion - Odeon Cinematic Sound Effects](https://www.paramountmotion.com/odeon-sound-effects).
173
 
174
-
175
  ### Evaluation datasets
176
 
177
  The audio-magnet models (sound effect generation) were evaluated on the [AudioCaps benchmark](https://audiocaps.github.io/).
 
18
  # MAGNeT - Small - 300M - 10secs
19
 
20
  MAGNeT is a text-to-music model capable of generating high-quality music samples conditioned on text descriptions.
21
+ It is a masked generative non-autoregressive Transformer trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
22
+ Unlike prior work, MAGNeT doesn't require neither semantic token conditioning nor model cascading, and it generates all 4 codebooks using a single non-autoregressive Transformer.
23
 
24
  MAGNeT was published in [Masked Audio Generation using a Single Non-Autoregressive Transformer](https://arxiv.org/abs/2401.04577) by *Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi*.
25
 
 
146
 
147
  ## Limitations and biases
148
 
149
+ **Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 16K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
150
 
151
  **Mitigations:** Tracks that include vocals have been removed from the data source using corresponding tags, and using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
152
 
 
171
 
172
  The audio-magnet models were trained on the following data sources: a subset of AudioSet (Gemmeke et al., 2017), [BBC sound effects](https://sound-effects.bbcrewind.co.uk/), AudioCaps (Kim et al., 2019), Clotho v2 (Drossos et al., 2020), VGG-Sound (Chen et al., 2020), FSD50K (Fonseca et al., 2021), [Free To Use Sounds](https://www.freetousesounds.com/all-in-one-bundle/), [Sonniss Game Effects](https://sonniss.com/gameaudiogdc), [WeSoundEffects](https://wesoundeffects.com/we-sound-effects-bundle-2020/), [Paramount Motion - Odeon Cinematic Sound Effects](https://www.paramountmotion.com/odeon-sound-effects).
173
 
 
174
  ### Evaluation datasets
175
 
176
  The audio-magnet models (sound effect generation) were evaluated on the [AudioCaps benchmark](https://audiocaps.github.io/).