Custom Classifier

#3
by kimedaka - opened

Hello @sanchit-gandhi ,
Thanks for sharing this Audio Classification model!!
Could you please suggest how I can use my own Audio Classifier? Is my understanding correct that AutoModelForAudioClassification actually uses WhisperForAudioClassification from https://github.com/huggingface/transformers/blob/v4.29.1/src/transformers/models/whisper/modeling_whisper.py#L1664 ?
If yes, can I define my own custom audio classifier the way WhisperForAudioClassification is defined with suitable modifications and use it directly without using AutoModelForAudioClassification ?
I am very much looking forward to hearing from you soon :)
Regards,
K

Hey @kimedaka - you can see the source code for WhisperForAudioClassification here and make and changes as desired (you can copy the modelling file, update the imports from relative to absolute, the make the changes you want)

Hello @sanchit-gandhi ,

Thanks so very much for your suggestion! It did work for me :)

  1. I saw in your code for WhisperForAudioClassification (line #1724), you defined freeze_encoder(), but in your run_audio_classification.py script for this model, you used model.freeze_feature_encoder() (line #357) instead of freeze_encoder(). Are these methods/function same? If not, why did you use model.freeze_feature_encoder()?

  2. In your blog post: "Fine-Tune Whisper For Multilingual ASR with πŸ€— Transformers", you said and used:

    -#compute log-Mel input features from input audio array
    batch["input_features"] = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]

  • that is, the first element of input_features 'list' of feature_extractor, but run_audio_classification.py script (line #303) for this model you used input_features itself; not its first element (i.e. output_batch = {model_input_name: inputs.get(model_input_name)}). What is the reason of doing so?

Regards,
K

Sign up or log in to comment