Model card
AMALIA-SFT-FALA is a Portuguese ASR checkpoint based on the SLAM-ASR architecture. It connects a pre-trained speech encoder to a large language model through a learned projection module, allowing the LLM to generate text transcriptions from speech representations.
Architecture
The model follows the SLAM-ASR design:
audio → speech encoder → projector → LLM → transcription
The speech encoder extracts acoustic representations from the input audio. These representations are mapped by a trainable projector into the embedding space expected by the language model. The LLM then autoregressively generates the transcription.
Components
- Base architecture: SLAM-ASR
- Speech encoder:
AMALIA-speech-encoder, also available on Hugging Face - Language model:
AMALIA-9B-1225-SFT - Task: Automatic Speech Recognition
- Language: Portuguese, with focus on European Portuguese
Intended Use
This model is intended for research and experimentation on European Portuguese ASR, especially in settings where speech representations are connected to an LLM instead of using a standalone encoder-decoder ASR model.
Example use cases include:
- transcribing Portuguese speech;
- evaluating LLM-augmented ASR systems;
- comparing SLAM-style ASR with conventional ASR models;
- research on European Portuguese speech processing.
Training Data
The model was developed in the context of European Portuguese ASR experiments using FalAR and CAMÕES ASR data.
FalAR contains more than 5000 hours of parliamentary speech from the Assambleia da República.
CAMÕES consists of a curated collection of up to 14 sub-corpora containing different domains and speech styles, totaling approximately 425 hours of speech.
The encoder AMALIA-speech-encoder was trained separately and reused as the speech encoder in this SLAM-ASR setup.
Limitations
This checkpoint is intended as a research artifact. Performance may vary depending on audio quality, speaker domain, recording conditions, and transcription style. The model may be less reliable on noisy audio, long-form speech, code-switching, or domains that differ from the training data.
Citation / Acknowledgements
This model builds on the SLAM-ASR idea of connecting a speech encoder to a large language model for automatic speech recognition.
Model tree for amalia-llm/AMALIA-SFT-FALA
Base model
amalia-llm/AMALIA-9B-1225-SFT