ConMamba-small-ca-
Table of Contents
Click to expand
Summary
The ConMamba-small-ca is an acoustic model for Automatic Speech Recognition (ASR) in Catalan. It is based on the ConMamba architecture, which uses a Mamba (State Space Model) encoder augmented with convolutions for efficient sequence processing.
Model Description
The ConMamba-small-ca model implements the Convolution-augmented Mamba (ConMamba) architecture, an adaptation of State Space Models (SSMs) designed to improve performance and efficiency in speech recognition tasks by integrating convolutional layers.
This model has been specifically trained for the Catalan language. The corpus used for training has 4929 hours.
Intended Uses and Limitations
This model can be used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.
How to Get Started with the Model
Installation
The implementation of the ConMamba-small-ca architecture often depends on specific libraries such as mamba-ssm and causal-conv1d. It is recommended to follow the installation steps from the original Mamba ASR repository:
- Create a virtual environment (mamba_asr, for example):
conda create --name mamba_asr python=3.9 conda activate mamba_asr - Install dependencies:
clone github https://github.com/langtech-bsc/ConMamba_ASR cd ConMamba_ASR pip install -r requirements.txt # Make sure that the versions of torch, torchaudio, causal-conv1d, and mamba-ssm are compatible with your hardware.
For Inference
Inference is performed using the dedicated run_inference.py script provided within the repository.
- Define Paths: Set the paths for the repository, the input audio, and the specific configuration file for inference.
- Execute Inference: Run the script using the defined paths.
# Define your paths
REPO_PATH="/path/to/ConMamba-ASR"
AUDIO="/path/to/your/audio.wav"
HPARAMS="conmambamamba_debug_catalan_small_1k_unigram_inference.yaml" # Use your specific inference YAML
# Execute inference script
python $REPO_PATH/run_inference.py \
--hparams $HPARAMS \
--audio $AUDIO
Dev Result - WER: 8.6
Training Details
Training data
The model was trained for a total of 4929 hours. Including:
- Parlament-Parla-v3 (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
- Corts Valencianes (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
- 3cat
- IB3 (The datasets will be made accessible shortly.)
- Common Voice ca 17 Benchmark
Training Hyperparameters
- Training hours: 4929
- language: catalan
- number_of_epochs: 110
- batch_size: 30
- ctc_weight: 0.6
- grad_accumulation_factor: 1
- max_grad_norm: 5.0
- loss_reduction: 'batchmean'
- sorting: random
- num_workers: 8
- precision: bf16
- avg_checkpoints: 10
- lr_adam: 0.001
Citation
If this model contributes to your research, please cite the work:
@inproceedings{zevallos2025conmambasmallca,
title={Evaluating High-Performance and Lightweight ASR Systems for Catalan},
author={Zevallos, Rodolfo}
organization={Barcelona Supercomputing Center},
year={2025}
}
Additional Information
Author
The model was trained during September (2025) in the Language Technologies Laboratory of the Barcelona Supercomputing Center by Rodolfo Zevallos.
Contact
For further information, please send an email to bsc-lt@bsc.es.
Copyright
Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.
License
Funding
This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.
The conversion of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.
Evaluation results
- Test WER on 3cat-parla-asrtest set self-reported2.810
- Test WER on parla_cleantest set self-reported5.470
- Test WER on parla_othertest set self-reported12.050
- Test WER on corts_clean_anontest set self-reported20.090
- Test WER on corts_other_anontest set self-reported36.910
- Test WER on central_maletest set self-reported7.730
- Test WER on central_femaletest set self-reported6.840
- Test WER on valencia_maletest set self-reported8.300