tsinghua-ee
/

SALMONN

Automatic Speech Recognition

automatic-audio-captioning

automatic-speech-translation

music-captioning

audio-based-storytelling

speech-audio-coreasoning

auditory understanding

Model card Files Files and versions Community

Changli commited on Sep 7, 2023

Commit

96f3cb5

•

1 Parent(s): 144d332

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 license: apache-2.0
 ---
 # SALMONN: Speech Audio Language Music Open Neural Network
-<div align=center><img src="resource/salmon.png" height="256px" width="256px"/></div>
 Welcome to the repo of **SALMONN**!
@@ -19,7 +19,7 @@ We will open source the code and the model checkpoint soon. Stay tuned!
 SALMONN adopts a speech & audio encoder to encode generic audio representation, then uses an audio-text aligner to map the audio feature into textual space. Finally, the large language model answers based on the textual prompt and the auditory tokens.
-<div align=center><img src="resource/structure.png" height="75%" width="75%"/></div>
 ## Demos

 license: apache-2.0
 ---
 # SALMONN: Speech Audio Language Music Open Neural Network
+<div align=center><img src="https://cdn-uploads.huggingface.co/production/uploads/63770389cdcc1bf630870758/sr9ABG_rv6P-VgesTMhOC.png" height="256px" width="256px"/></div>
 Welcome to the repo of **SALMONN**!
 SALMONN adopts a speech & audio encoder to encode generic audio representation, then uses an audio-text aligner to map the audio feature into textual space. Finally, the large language model answers based on the textual prompt and the auditory tokens.
+<div align=center><img src="https://cdn-uploads.huggingface.co/production/uploads/63770389cdcc1bf630870758/TEZzr54VZ5yc34LeixFbi.png" height="75%" width="75%"/></div>
 ## Demos