whisper-small-dv model is an advanced Automatic Speech Recognition (ASR) model, trained on the extensive Mozilla Common Voice 13.0 dataset. This model is capable of transcribing spoken language into written text with high accuracy, making it a valuable tool for a wide range of applications, from transcription services to voice assistants.
The model was trained using the PyTorch framework and the Transformers library. Training metrics and visualizations can be viewed on TensorBoard.
The model's performance was evaluated on a held-out test set. The evaluation metrics and results can be found in the "Eval Results" section.
The model can be used for any ASR task. To use the model, you can load it using the Transformers library:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor # Load the model model = Wav2Vec2ForCTC.from_pretrained("Ryukijano/whisper-small-dv") processor = Wav2Vec2Processor.from_pretrained("Ryukijano/whisper-small-dv") # Use the model for ASR inputs = processor("path_to_audio_file", return_tensors="pt", padding=True) logits = model(inputs.input_values).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.decode(predicted_ids)
This model is released under the MIT license.
- Downloads last month