Error from speechbrain.pretrained import ASRCNNTransducer

#1
by Joan1949 - opened

Hello, when i try to execute --> from speechbrain.pretrained import ASRCNNTransducer i gave an error

驴could you help me, please?

Hello
@Joan1949
Can you share the exact error ?

Hi, the error is this:

ModuleNotFoundError Traceback (most recent call last)
in <cell line: 9>()
7 from torch.utils.data import DataLoader
8 import speechbrain as sb
----> 9 from speechbrain.pretrained import ASRCNNTransducer
10
11 # Configuraci贸n de hiperpar谩metros

ModuleNotFoundError: No module named 'speechbrain.pretrained'

Do pip install speechbrain==0.5.26

Hi, now i have this error:

ERROR: Could not find a version that satisfies the requirement speechbrain==0.5.26 (from versions: 0.5.4, 0.5.5, 0.5.6, 0.5.7, 0.5.8, 0.5.9, 0.5.10, 0.5.11, 0.5.12, 0.5.13, 0.5.14, 0.5.15, 0.5.16, 1.0.0)
ERROR: No matching distribution found for speechbrain==0.5.26

This is my code:

import os
import torch
from torch import optim
from speechbrain.pretrained import ASRCNNTransducer
from speechbrain.tokenizers.SentencePiece import SentencePiece
from speechbrain.dataio.batch import PaddedBatch
from torch.utils.data import DataLoader

Configuraci贸n de hiperpar谩metros

learning_rate = 1e-4
num_epochs = 10
batch_size = 8
model_checkpoint = "speechbrain/asr-crdnn-commonvoice-14-es"
dataset_folder = "dataset" # Carpeta que contiene todos los archivos

Cargar el modelo pre-entrenado

asr_model = ASRCNNTransducer.from_hparams(source=model_checkpoint, savedir="pretrained_model")

Optimizador

optimizer = optim.Adam(asr_model.parameters(), lr=learning_rate)

Funci贸n de p茅rdida (puedes ajustar seg煤n tus necesidades)

criterion = torch.nn.CTCLoss(blank=asr_model.tokenizer.tokenizer.pad_id, reduction='mean')

Funci贸n para cargar los archivos de texto y audio

def load_data(folder):
audio_files = []
text_data = {}
for filename in os.listdir(folder):
if filename.endswith(".wav"):
audio_files.append(os.path.join(folder, filename))
elif filename.endswith(".txt"):
with open(os.path.join(folder, filename), "r", encoding="utf-8") as file:
text = file.read().strip()
basename = os.path.splitext(filename)[0]
text_data[basename] = text
return audio_files, text_data

Cargar archivos de texto y audio

audio_files, text_data = load_data(dataset_folder)

Combinar audio y texto

dataset = [(audio_file, text_data[os.path.splitext(os.path.basename(audio_file))[0]]) for audio_file in audio_files]

DataLoader

dataloader = DataLoader(dataset, batch_size=batch_size, collate_fn=PaddedBatch)

Entrenamiento del modelo

for epoch in range(num_epochs):
asr_model.train()
total_loss = 0.0
for audio_paths, transcriptions in dataloader:
# Aqu铆 deber铆as implementar la l贸gica para cargar los archivos de audio y texto,
# y luego utilizarlos para el entrenamiento del modelo
# Esto incluir谩 la lectura del audio, conversi贸n a caracter铆sticas de entrada del modelo, etc.
optimizer.zero_grad()
logits = asr_model(inputs)
logits = logits.transpose(1, 0) # Transponer logits para que coincidan con la forma esperada por CTCLoss
loss = criterion(logits, targets, input_lens, target_lens)
loss.backward()
optimizer.step()
total_loss += loss.item()
print(f"Epoch {epoch+1}, Loss: {total_loss}")

Guardar el modelo entrenado

torch.save(asr_model.state_dict(), "trained_model.pth")

Sorry my bad it is speechbrain=0.5.16

Sign up or log in to comment