--- library_name: peft base_model: openai/whisper-large-v2 datasets: - mozilla-foundation/common_voice_16_0 language: - ja metrics: - wer --- # Model Card for Model ID Japanese transcription, testing in progress to see results, main personal use cases are japanese comedy usage 9GB vram with this Lora ## Model Details ### Model Description openai-whisper-large-v2-LORA-ja - **Developed by:** FZNX - **Model type:** PEFT LORA - **Language(s) (NLP):** Fine tune Japanese on whisper common 16 - **License:** [More Information Needed] - **Finetuned from model [optional]:** Whisper Large V2 ## How to Get Started with the Model import torch from transformers import ( AutomaticSpeechRecognitionPipeline, WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor, ) from peft import PeftModel, PeftConfig peft_model_id = "fznx92/openai-whisper-large-v2-ja-transcribe-colab" sample = "insert mp3 file location here" language = "japanese" task = "transcribe" peft_config = PeftConfig.from_pretrained(peft_model_id) model = WhisperForConditionalGeneration.from_pretrained( peft_config.base_model_name_or_path, ) model = PeftModel.from_pretrained(model, peft_model_id) model.to("cuda").half() processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task) pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, batch_size=8, torch_dtype=torch.float16, device="cuda:0") def transcribe(audio, return_timestamps=False): text = pipe(audio, chunk_length_s=30, return_timestamps=return_timestamps, generate_kwargs={"language": language, "task": task})["text"] return text transcript = transcribe(sample) print(transcript) ### Training Data Common Voice 16 dataset ### Training Procedure via Google Colab T5 @ 6 hours ## Evaluation ### Framework versions - PEFT 0.7.1