Feature Extraction
Transformers
Safetensors
English
custom_model
multi-modal
speech-language
custom_code
Eval Results
shangeth commited on
Commit
183a43c
1 Parent(s): 1b27420

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -57,6 +57,14 @@ model-index:
57
 
58
  # SpeechLLM
59
 
 
 
 
 
 
 
 
 
60
  ## Usage
61
  ```python
62
  # Load model directly from huggingface
 
57
 
58
  # SpeechLLM
59
 
60
+ SpeechLLM is a multi-modal LLM trained to predict the metadata of the speaker's turn in a conversation. SpeechLLM model is based on HubertX acoustic encoder and TinyLlama LLM. The model predicts the following:
61
+ 1. Speech Activity
62
+ 2. ASR Transcript
63
+ 3. Gender of the speaker
64
+ 4. Age of the speaker
65
+ 5. Accent of the speaker
66
+ 6. Emotion of the speaker
67
+
68
  ## Usage
69
  ```python
70
  # Load model directly from huggingface