ConMamba-small-ca-

Table of Contents

Click to expand

Summary

The ConMamba-small-ca is an acoustic model for Automatic Speech Recognition (ASR) in Catalan. It is based on the ConMamba architecture, which uses a Mamba (State Space Model) encoder augmented with convolutions for efficient sequence processing.

Model Description

The ConMamba-small-ca model implements the Convolution-augmented Mamba (ConMamba) architecture, an adaptation of State Space Models (SSMs) designed to improve performance and efficiency in speech recognition tasks by integrating convolutional layers.

This model has been specifically trained for the Catalan language. The corpus used for training has 4929 hours.

Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Catalan. The model is intended to transcribe audio files in Catalan to plain text without punctuation.

How to Get Started with the Model

Installation

The implementation of the ConMamba-small-ca architecture often depends on specific libraries such as mamba-ssm and causal-conv1d. It is recommended to follow the installation steps from the original Mamba ASR repository:

  1. Create a virtual environment (mamba_asr, for example):
    conda create --name mamba_asr python=3.9
    conda activate mamba_asr
    
  2. Install dependencies:
    clone github https://github.com/langtech-bsc/ConMamba_ASR
    cd ConMamba_ASR
    pip install -r requirements.txt
    # Make sure that the versions of torch, torchaudio, causal-conv1d, and mamba-ssm are compatible with your hardware.
    

For Inference

Inference is performed using the dedicated run_inference.py script provided within the repository.

  1. Define Paths: Set the paths for the repository, the input audio, and the specific configuration file for inference.
  2. Execute Inference: Run the script using the defined paths.
# Define your paths
REPO_PATH="/path/to/ConMamba-ASR" 
AUDIO="/path/to/your/audio.wav"
HPARAMS="conmambamamba_debug_catalan_small_1k_unigram_inference.yaml" # Use your specific inference YAML

# Execute inference script
python  $REPO_PATH/run_inference.py \
  --hparams $HPARAMS \
  --audio $AUDIO

Dev Result - WER: 8.6

Training Details

Training data

The model was trained for a total of 4929 hours. Including:

Training Hyperparameters

  • Training hours: 4929
  • language: catalan
  • number_of_epochs: 110
  • batch_size: 30
  • ctc_weight: 0.6
  • grad_accumulation_factor: 1
  • max_grad_norm: 5.0
  • loss_reduction: 'batchmean'
  • sorting: random
  • num_workers: 8
  • precision: bf16
  • avg_checkpoints: 10
  • lr_adam: 0.001

Citation

If this model contributes to your research, please cite the work:

@inproceedings{zevallos2025conmambasmallca,
  title={Evaluating High-Performance and Lightweight ASR Systems for Catalan},
  author={Zevallos, Rodolfo}
  organization={Barcelona Supercomputing Center},
  year={2025}
}

Additional Information

Author

The model was trained during September (2025) in the Language Technologies Laboratory of the Barcelona Supercomputing Center by Rodolfo Zevallos.

Contact

For further information, please send an email to bsc-lt@bsc.es.

Copyright

Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.

License

GPL-3.0

Funding

This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.

The conversion of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results