racai-andrei's picture
Update README.md
8b61d43
---
license: mit
---
The model was fine-tuned on 300h of public and private speech data. More information will be given once the underlying paper gets published.
```
import librosa
from transformers import Wav2Vec2Processor, AutoModelForCTC
import torch
audio, _ = librosa.load("[audio_path]", sr=16000)
model = AutoModelForCTC.from_pretrained("racai/wav2vec2-base-100k-voxpopuli-romanian")
processor = Wav2Vec2Processor.from_pretrained("racai/wav2vec2-base-100k-voxpopuli-romanian")
input_dict = processor(audio, sampling_rate=16000, return_tensors="pt")
with torch.inference_mode():
logits = model(input_dict.input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
predicted_sentence = processor.batch_decode(predicted_ids)[0]
print("Prediction:", predicted_sentence)
```