hyx_194
commited on
Commit
·
9feb397
1
Parent(s):
0839443
add image
Browse files
README.md
CHANGED
@@ -28,6 +28,8 @@ The architecture comprises three key components: an audio encoder that transform
|
|
28 |
|
29 |
Specifically, we fine-tuned the MERaLiON-Whisper encoder from Whisper-large-v2 for the audio encoder and used SEA-LION V3, a localised LLM developed by our partner AI Singapore as the text decoder.
|
30 |
|
|
|
|
|
31 |
## Capabilities
|
32 |
|
33 |
MERaLiON-AudioLLM is trained to address 8 tasks, including Automatic Speech Recognition (ASR), Speech Translation (ST), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), Speech Instruction (SI), Paralinguistics (PARA), Audio Captioning (AC), and Audio Scene Question Answering (ASQA).
|
|
|
28 |
|
29 |
Specifically, we fine-tuned the MERaLiON-Whisper encoder from Whisper-large-v2 for the audio encoder and used SEA-LION V3, a localised LLM developed by our partner AI Singapore as the text decoder.
|
30 |
|
31 |
+
<img src="model_architecture.png" alt="model_architecture" width="600" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
32 |
+
|
33 |
## Capabilities
|
34 |
|
35 |
MERaLiON-AudioLLM is trained to address 8 tasks, including Automatic Speech Recognition (ASR), Speech Translation (ST), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), Speech Instruction (SI), Paralinguistics (PARA), Audio Captioning (AC), and Audio Scene Question Answering (ASQA).
|