MERaLiON
/

MERaLiON-AudioLLM-Whisper-SEA-LION

Automatic Speech Recognition

Model card Files Files and versions Community

hyx_194 commited on Dec 9, 2024

Commit

9feb397

·

1 Parent(s): 0839443

add image

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -28,6 +28,8 @@ The architecture comprises three key components: an audio encoder that transform
 Specifically, we fine-tuned the MERaLiON-Whisper encoder from Whisper-large-v2 for the audio encoder and used SEA-LION V3, a localised LLM developed by our partner AI Singapore as the text decoder.
 ## Capabilities
 MERaLiON-AudioLLM is trained to address 8 tasks, including Automatic Speech Recognition (ASR), Speech Translation (ST), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), Speech Instruction (SI), Paralinguistics (PARA), Audio Captioning (AC), and Audio Scene Question Answering (ASQA).

 Specifically, we fine-tuned the MERaLiON-Whisper encoder from Whisper-large-v2 for the audio encoder and used SEA-LION V3, a localised LLM developed by our partner AI Singapore as the text decoder.
+<img src="model_architecture.png" alt="model_architecture" width="600" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 ## Capabilities
 MERaLiON-AudioLLM is trained to address 8 tasks, including Automatic Speech Recognition (ASR), Speech Translation (ST), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), Speech Instruction (SI), Paralinguistics (PARA), Audio Captioning (AC), and Audio Scene Question Answering (ASQA).