How to fix "TypeError: expected str, bytes or os.PathLike object, not NoneType" when specifying the local whisper model

#65
by BenjaminChu - opened

Here is my modified code specifying the local path of whisper files:


import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "E:\LLM\whisper-large-v3\models"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)


result = pipe("audio.mp3",return_timestamps=True)
print(result["text"])

And it shows:

raceback (most recent call last):
  File "e:\LLM\whisper-large-v3\main.py", line 15, in <module>
    processor = AutoProcessor.from_pretrained(model_id)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\LLM\chatglm2-6b\.venv\Lib\site-packages\transformers\models\auto\processing_auto.py", line 268, in from_pretrained
    return processor_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\LLM\chatglm2-6b\.venv\Lib\site-packages\transformers\processing_utils.py", line 184, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\LLM\chatglm2-6b\.venv\Lib\site-packages\transformers\processing_utils.py", line 228, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\LLM\chatglm2-6b\.venv\Lib\site-packages\transformers\tokenization_utils_base.py", line 1825, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "E:\LLM\chatglm2-6b\.venv\Lib\site-packages\transformers\tokenization_utils_base.py", line 1988, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\LLM\chatglm2-6b\.venv\Lib\site-packages\transformers\models\whisper\tokenization_whisper.py", line 293, in __init__
    with open(merges_file, encoding="utf-8") as merges_handle:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType

How to fix that?

same here. already tried everything in the book to have the path be recognized as an os.PathLike but it appears nothing is working .-.

a little update from my part. i tried around a lot more and found smth that works. before the result=--- add either a model.save_pretrained(path) or pipe.save_pretrained(path) (as im not sure which of those two actually did smth, i just did both). Saves everything needed from the online repo to be used locally. just put the model id after as the path and delete the added lines.

Sign up or log in to comment