rahular commited on
Commit
8c0524d
·
verified ·
1 Parent(s): 11f413e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -4,6 +4,17 @@ language:
4
  - en
5
  - hi
6
  ---
 
 
 
 
 
 
 
 
 
 
 
7
  ```
8
  import transformers
9
  import librosa
@@ -21,4 +32,6 @@ turns = [
21
  ]
22
 
23
  pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=512)
24
- ```
 
 
 
4
  - en
5
  - hi
6
  ---
7
+
8
+ `Shuka v1` is a language model which natively understands audio in Indic languages. It is an encoder-decoder model built by combining two models:
9
+ - Our state-of-the-art, in-house, audio encoder: Saaras v1
10
+ - Meta’s Llama3-8B-Instruct as the decoder
11
+
12
+ The encoder and decoder are connected by a small projector with ~60M parameters. During training, only the projector weights are finetuned while the rest of the network is frozen. Following our tradition of training models frugally, we train `Shuka v1` on less than 100 hours of audio.
13
+
14
+ Though we only finetune the projector on English and Hindi data, the multilingual nature of our encoder makes `Shuka v1` perform well on zero-shot QA in other Indic languages as well. We have tested on the model on Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu.
15
+
16
+ You can get started by using huggingface pipeline, as follows:
17
+
18
  ```
19
  import transformers
20
  import librosa
 
32
  ]
33
 
34
  pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=512)
35
+ ```
36
+
37
+ For more details, please see our blog (link coming soon).