Model card

AMALIA-SFT-FALA is a Portuguese ASR checkpoint based on the SLAM-ASR architecture. It connects a pre-trained speech encoder to a large language model through a learned projection module, allowing the LLM to generate text transcriptions from speech representations.

Architecture

The model follows the SLAM-ASR design:

audio → speech encoder → projector → LLM → transcription

The speech encoder extracts acoustic representations from the input audio. These representations are mapped by a trainable projector into the embedding space expected by the language model. The LLM then autoregressively generates the transcription.

Components

Base architecture: SLAM-ASR
Speech encoder: AMALIA-speech-encoder, also available on Hugging Face
Language model: AMALIA-9B-1225-SFT
Task: Automatic Speech Recognition
Language: Portuguese, with focus on European Portuguese

Intended Use

This model is intended for research and experimentation on European Portuguese ASR, especially in settings where speech representations are connected to an LLM instead of using a standalone encoder-decoder ASR model.

Example use cases include:

transcribing Portuguese speech;
evaluating LLM-augmented ASR systems;
comparing SLAM-style ASR with conventional ASR models;
research on European Portuguese speech processing.

Training Data

The model was developed in the context of European Portuguese ASR experiments using FalAR and CAMÕES ASR data. FalAR contains more than 5000 hours of parliamentary speech from the Assambleia da República. CAMÕES consists of a curated collection of up to 14 sub-corpora containing different domains and speech styles, totaling approximately 425 hours of speech. The encoder AMALIA-speech-encoder was trained separately and reused as the speech encoder in this SLAM-ASR setup.

Limitations

This checkpoint is intended as a research artifact. Performance may vary depending on audio quality, speaker domain, recording conditions, and transcription style. The model may be less reliable on noisy audio, long-form speech, code-switching, or domains that differ from the training data.

Citation / Acknowledgements

This model builds on the SLAM-ASR idea of connecting a speech encoder to a large language model for automatic speech recognition.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for amalia-llm/AMALIA-SFT-FALA

Base model

amalia-llm/AMALIA-9B-1225-SFT

Finetuned

(1)

this model

Datasets used to train amalia-llm/AMALIA-SFT-FALA

Collection including amalia-llm/AMALIA-SFT-FALA

Amalia Fala

Collection

3 items • Updated 2 days ago

Papers for amalia-llm/AMALIA-SFT-FALA

FalAR: A Large-scale Speaker-Annotated European Portuguese Speech Corpus of Parliamentary Sessions

Paper • 2605.27062 • Published May 26

CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese

Paper • 2508.19721 • Published Aug 27, 2025 • 5

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

Paper • 2402.08846 • Published Feb 13, 2024 • 2