hojin commited on
Commit
dd65ee5
1 Parent(s): 5444374

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -3,4 +3,19 @@ widget:
3
  - text: Jens Peter Hansen kommer fra Danmark
4
  pipeline_tag: automatic-speech-recognition
5
  license: afl-3.0
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - text: Jens Peter Hansen kommer fra Danmark
4
  pipeline_tag: automatic-speech-recognition
5
  license: afl-3.0
6
+ language:
7
+ - en
8
+ tags:
9
+ - text-generation-inference
10
+ ---
11
+
12
+ - Abstract
13
+
14
+ Language models (LMs) have made significant advancements, but their ability to incorporate voice input from human speech remains limited, often requiring audio-to-text transcription and resulting in the loss of vital vocal characteristics. To address this challenge, we train a transformer-based language model capable of accepting audio inputs alongside preceding conversations or prompts, enabling predictions for subsequent utterances. In addition to utilizing publicly available language model data, we collect a dataset of 3K hours of audio from the web, creating audio-text pairs representing the ensuing conversation. Additionally, we augment the training data by converting publicly available vocal characteristic labels (e.g., sentiment, gender) associated with the audio into language-based descriptions, enhancing the model's understanding of vocal nuances. Our findings demonstrate the model's capacity to perceive and comprehend audio content, generating meaningful responses grounded in auditory information. This work illuminates the potential of language models to facilitate audio-based interactions, bridging the gap between textual and vocal communication.
15
+
16
+
17
+ - colab
18
+ https://colab.research.google.com/drive/12F8EVlMZldaaEdubbKCPcxd12ORnpDOe?usp=sharing
19
+
20
+ - Technical report
21
+ XXX