Preprocessor for `faceebook/hubert-base-ls960`

#1
by chompk - opened

Hi,

I'm trying to use this model for experiment with downstream task. I'm following this tutorial for using HuBERT model. Here's my code snippet:

processor = Wav2Vec2Processor.from_pretrained("facebook/hubert-base-ls960")
model = HubertForCTC.from_pretrained("facebook/hubert-base-ls960")

However, the code raise the following error:

/path/to/python/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:58: FutureWarning: Loading a tokenizer inside Wav2Vec2Processor from a config that does not include a `tokenizer_class` attribute is deprecated and will be removed in v5. Please add `'tokenizer_class': 'Wav2Vec2CTCTokenizer'` attribute to either your `config.json` or `tokenizer_config.json` file to suppress this warning:

...

OSError: Can't load tokenizer for 'facebook/hubert-base-ls960'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'facebook/hubert-base-ls960' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.

However, when i retry this code with facebook/hubert-large-ls960-ft, there's no error showed and the code run just fine. Does this means that facebook/hubert-base-ls960 doesn't have a preprocessor? If so, is there any necessary normalization steps required?

Resolved, it seems that I need to declare the tokenizer and preprocessor as described in here

chompk changed discussion status to closed

Sign up or log in to comment