hyx_194 commited on
Commit
9feb397
·
1 Parent(s): 0839443
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -28,6 +28,8 @@ The architecture comprises three key components: an audio encoder that transform
28
 
29
  Specifically, we fine-tuned the MERaLiON-Whisper encoder from Whisper-large-v2 for the audio encoder and used SEA-LION V3, a localised LLM developed by our partner AI Singapore as the text decoder.
30
 
 
 
31
  ## Capabilities
32
 
33
  MERaLiON-AudioLLM is trained to address 8 tasks, including Automatic Speech Recognition (ASR), Speech Translation (ST), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), Speech Instruction (SI), Paralinguistics (PARA), Audio Captioning (AC), and Audio Scene Question Answering (ASQA).
 
28
 
29
  Specifically, we fine-tuned the MERaLiON-Whisper encoder from Whisper-large-v2 for the audio encoder and used SEA-LION V3, a localised LLM developed by our partner AI Singapore as the text decoder.
30
 
31
+ <img src="model_architecture.png" alt="model_architecture" width="600" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
32
+
33
  ## Capabilities
34
 
35
  MERaLiON-AudioLLM is trained to address 8 tasks, including Automatic Speech Recognition (ASR), Speech Translation (ST), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), Speech Instruction (SI), Paralinguistics (PARA), Audio Captioning (AC), and Audio Scene Question Answering (ASQA).