ConMamba-small-ca-

Click to expand

Model Description
Intended Uses and Limitations
How to Get Started with the Model
Conversion Details
Citation
Additional information

Summary

The ConMamba-small-ca is an acoustic model for Automatic Speech Recognition (ASR) in Catalan. It is based on the ConMamba architecture, which uses a Mamba (State Space Model) encoder augmented with convolutions for efficient sequence processing.

Model Description

The ConMamba-small-ca model implements the Convolution-augmented Mamba (ConMamba) architecture, an adaptation of State Space Models (SSMs) designed to improve performance and efficiency in speech recognition tasks by integrating convolutional layers.

This model has been specifically trained for the Catalan language. The corpus used for training has 4929 hours.

Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.

How to Get Started with the Model

Installation

The implementation of the ConMamba-small-ca architecture often depends on specific libraries such as mamba-ssm and causal-conv1d. It is recommended to follow the installation steps from the original Mamba ASR repository:

Create a virtual environment (mamba_asr, for example):

conda create --name mamba_asr python=3.9
conda activate mamba_asr

Install dependencies:

clone github https://github.com/langtech-bsc/ConMamba_ASR
cd ConMamba_ASR
pip install -r requirements.txt
# Make sure that the versions of torch, torchaudio, causal-conv1d, and mamba-ssm are compatible with your hardware.

For Inference

Inference is performed using the dedicated run_inference.py script provided within the repository.

Define Paths: Set the paths for the repository, the input audio, and the specific configuration file for inference.
Execute Inference: Run the script using the defined paths.

# Define your paths
REPO_PATH="/path/to/ConMamba-ASR" 
AUDIO="/path/to/your/audio.wav"
HPARAMS="conmambamamba_debug_catalan_small_1k_unigram_inference.yaml" # Use your specific inference YAML

# Execute inference script
python  $REPO_PATH/run_inference.py \
  --hparams $HPARAMS \
  --audio $AUDIO

Dev Result - WER: 8.6

Training Details

Training data

The model was trained for a total of 4929 hours. Including:

Parlament-Parla-v3 (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
Corts Valencianes (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version.)
3cat
IB3 (The datasets will be made accessible shortly.)
Common Voice ca 17 Benchmark

Training Hyperparameters

Training hours: 4929
language: catalan
number_of_epochs: 110
batch_size: 30
ctc_weight: 0.6
grad_accumulation_factor: 1
max_grad_norm: 5.0
loss_reduction: 'batchmean'
sorting: random
num_workers: 8
precision: bf16
avg_checkpoints: 10
lr_adam: 0.001

Citation

If this model contributes to your research, please cite the work:

@inproceedings{zevallos2025conmambasmallca,
  title={Evaluating High-Performance and Lightweight ASR Systems for Catalan},
  author={Zevallos, Rodolfo}
  organization={Barcelona Supercomputing Center},
  year={2025}
}

Additional Information

Author

The model was trained during September (2025) in the Language Technologies Laboratory of the Barcelona Supercomputing Center by Rodolfo Zevallos.

Contact

For further information, please send an email to bsc-lt@bsc.es.

Copyright

License

GPL-3.0

Funding

This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.

The conversion of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

Test WER on 3cat-parla-asr
test set self-reported

2.810
Test WER on parla_clean
test set self-reported

5.470
Test WER on parla_other
test set self-reported

12.050
Test WER on corts_clean_anon
test set self-reported

20.090
Test WER on corts_other_anon
test set self-reported

36.910
Test WER on central_male
test set self-reported

7.730
Test WER on central_female
test set self-reported

6.840
Test WER on valencia_male
test set self-reported

8.300

BSC-LT
/

ConMamba-small-ca