MERT
MERT (Acoustic Music Understanding Model with Large-Scale Self-supervised Training) incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
The pre-trained weights of MERT came from m-a-p/MERT-v1-95M. In this repository, we registered MERT for AutoModelForAudioClassification auto class.
Usage
import numpy as np
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
# Some configurations
model_id = 'yangwang825/mert-base'
batch_size = 4
num_classes = 10
max_duration = 1.0
# Initialise the extractor and model
feature_extractor = AutoFeatureExtractor.from_pretrained(
model_id,
trust_remote_code=True
)
mert = AutoModelForAudioClassification.from_pretrained(
model_id,
num_labels=num_classes,
ignore_mismatched_sizes=True,
trust_remote_code=True
)
# Simulate a list of waveforms (e.g. four audio clips)
audio_arrays = [
np.random.rand(16000, ),
np.random.rand(24000, ),
np.random.rand(22050, ),
np.random.rand(44100, )
]
inputs = feature_extractor(
audio_arrays, # List of waveforms in numpy array format
sampling_rate=feature_extractor.sampling_rate,
max_length=int(feature_extractor.sampling_rate * max_duration),
padding='max_length',
truncation=True,
return_tensors='pt'
)
# The shape of `input_values` is (batch_size, sample_rate * max_duration)
input_values = inputs['input_values']
outputs = mert(**inputs)
# The shape of `logits` is (batch_size, num_classes)
logits = outputs['logits']
- Downloads last month
- 100
Inference API (serverless) does not yet support model repos that contain custom code.