Preprocessor for `faceebook/hubert-base-ls960`
#1
by
chompk
- opened
Hi,
I'm trying to use this model for experiment with downstream task. I'm following this tutorial for using HuBERT model. Here's my code snippet:
processor = Wav2Vec2Processor.from_pretrained("facebook/hubert-base-ls960")
model = HubertForCTC.from_pretrained("facebook/hubert-base-ls960")
However, the code raise the following error:
/path/to/python/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py:58: FutureWarning: Loading a tokenizer inside Wav2Vec2Processor from a config that does not include a `tokenizer_class` attribute is deprecated and will be removed in v5. Please add `'tokenizer_class': 'Wav2Vec2CTCTokenizer'` attribute to either your `config.json` or `tokenizer_config.json` file to suppress this warning:
...
OSError: Can't load tokenizer for 'facebook/hubert-base-ls960'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'facebook/hubert-base-ls960' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.
However, when i retry this code with facebook/hubert-large-ls960-ft
, there's no error showed and the code run just fine. Does this means that facebook/hubert-base-ls960
doesn't have a preprocessor? If so, is there any necessary normalization steps required?
chompk
changed discussion status to
closed
i am clear on the solution, what needs to be changed?