|
--- |
|
license: mit |
|
--- |
|
# Model description |
|
|
|
This is EnCodecMAE, an audio feature extractor pretrained with masked language modelling to predict discrete targets generated by EnCodec, a neural audio codec. |
|
For more details about the architecture and pretraining procedure, read the [paper](https://arxiv.org/abs/2309.07391). |
|
|
|
# Usage |
|
|
|
### 1) Clone the [EnCodecMAE library](https://github.com/habla-liaa/encodecmae): |
|
``` |
|
git clone https://github.com/habla-liaa/encodecmae.git |
|
``` |
|
|
|
### 2) Install it: |
|
|
|
``` |
|
cd encodecmae |
|
pip install -e . |
|
``` |
|
|
|
### 3) Extract embeddings in Python: |
|
|
|
``` python |
|
from encodecmae import load_model |
|
|
|
model = load_model('base', device='cuda:0') |
|
features = model.extract_features_from_file('gsc/bed/00176480_nohash_0.wav') |
|
``` |