facebook
/

magnet-small-10secs

Model card Files Files and versions Community

alonzi commited on Jan 11

Commit

f1961dc

•

1 Parent(s): d671b54

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -137,7 +137,9 @@ The model was trained on licensed data using the following sources: the [Meta Mu
 ## Evaluation results
-Below are the objective metrics obtained on MusicCaps with the released model. Note that for the publicly released models, we had all the datasets go through a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs), in order to keep only the instrumental part. This explains the difference in objective metrics with the models used in the paper.
 | Model | Frechet Audio Distance | KLD | Text Consistency |
 |---|---|---|---|
@@ -150,7 +152,7 @@ More information can be found in the paper [Masked Audio Generation using a Sing
 ## Limitations and biases
-**Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
 **Mitigations:** Tracks that include vocals have been removed from the data source using corresponding tags, and using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).

 ## Evaluation results
+Below are the objective metrics obtained on MusicCaps with the released model. Note that for the publicly released models, we used the state-of-the-art music source separation method,
+namely the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs),
+in order to keep only instrumental tracks. This explains the difference in objective metrics with the models used in the paper.
 | Model | Frechet Audio Distance | KLD | Text Consistency |
 |---|---|---|---|
 ## Limitations and biases
+**Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 15K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
 **Mitigations:** Tracks that include vocals have been removed from the data source using corresponding tags, and using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).