Missing files?

by eyldlv - opened Jan 9, 2024

Jan 9, 2024

•

edited Jan 9, 2024

Salü! Thanks for uploading this model! I'm trying to load it, but I'm getting the following errors:

>>> feature_extractor = WhisperFeatureExtractor.from_pretrained(peft_model_id)

OSError: Flurin17/whisper-large-v3-peft-swiss-german does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/Flurin17/whisper-large-v3-peft-swiss-german/main' for available files.

and

>>> tokenizer = WhisperTokenizer.from_pretrained(peft_model_id, language='german', task='transcribe')

OSError: Can't load tokenizer for 'Flurin17/whisper-large-v3-peft-swiss-german'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Flurin17/whisper-large-v3-peft-swiss-german' is the correct path to a directory containing all relevant files for a WhisperTokenizer tokenizer.

Could it be that some files are missing from your upload?

märsi und liebi grüess

Flurin17

Owner Jan 9, 2024

Hoi Eyal

I made a mistake. Apologies

model_name_or_path = "openai/whisper-large-v3"
task = "transcribe"
import json
import os
from transformers import WhisperFeatureExtractor
from transformers import WhisperTokenizer

feature_extractor = WhisperFeatureExtractor.from_pretrained(model_name_or_path)
tokenizer = WhisperTokenizer.from_pretrained(model_name_or_path, task=task)


from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer

peft_model_id = "flurin17/whisper-large-v3-peft-swiss-german" # Use the same model ID as before.
peft_config = PeftConfig.from_pretrained(peft_model_id)
model = WhisperForConditionalGeneration.from_pretrained(
    peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
)
model = PeftModel.from_pretrained(model, peft_model_id)
model.config.use_cache = True


from transformers import AutomaticSpeechRecognitionPipeline
import torch
pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=tokenizer, feature_extractor=feature_extractor)

with torch.cuda.amp.autocast():
    result = pipe(r"L:\random\audio.mp3", generate_kwargs={"language": "german"})
print(result["text"])

eyldlv

Jan 10, 2024

•

edited Jan 10, 2024 by

Flurin17

hoi Flurin

danke dir vielmal fürd korrekture. Etz funktionierts!
Ich hetti imfall no e paar frage für dich :) Wärs okay zum dich privat kontaktiere? Oder chönntisch du mir villicht schnell es churzes mail schribe?
REDACTED

merci vielmal und en guete! ^^

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment