Navvye commited on
Commit
8d0653d
1 Parent(s): 9834f70

Updated Readme.md

Browse files

Provided vital information for the Model.

Files changed (1) hide show
  1. README.md +16 -5
README.md CHANGED
@@ -31,20 +31,31 @@ It achieves the following results on the evaluation set:
31
  - Loss: 0.2967
32
  - Wer: 0.1740
33
 
34
- ## Model description
35
 
36
- More information needed
37
 
38
- ## Intended uses & limitations
39
 
40
- More information needed
41
 
42
  ## Training and evaluation data
43
 
44
- More information needed
 
 
 
 
 
45
 
46
  ## Training procedure
47
 
 
 
 
 
 
 
 
48
  ### Training hyperparameters
49
 
50
  The following hyperparameters were used during training:
 
31
  - Loss: 0.2967
32
  - Wer: 0.1740
33
 
34
+ ## Usage
35
 
36
+ In order to evaluate this model on an entire dataset, the evaluation codes available in the whisper-finetune repository can be used.
37
 
38
+ The same repository also provides the scripts for faster inference using whisper-jax.
39
 
 
40
 
41
  ## Training and evaluation data
42
 
43
+ Training Data:
44
+ - [Snow Mountain Dataset for Kangri Language](https://huggingface.co/datasets/bridgeconn/snow-mountain)
45
+
46
+ Evaluation Data:
47
+ - [Snow Mountain Dataset for Kangri Language](https://huggingface.co/datasets/bridgeconn/snow-mountain)
48
+ - [Kangri Translators Dataset ](https://drive.google.com/drive/folders/16BdOieekGRAo2bFOQDd4YhE2LpgiRnqQ?usp=share_link)
49
 
50
  ## Training procedure
51
 
52
+ We implemented Cross-Lingual Phoneme Recognition - a process that leverages patterns in resource-rich languages such as Hindi to recognize utterances in resource-poor languages
53
+ such as Kangri. By fine-tuning a pre-trained model of the Whisper-Hindi-Large-V2 on a customised dataset - we have achieved SoTa accuracy.
54
+ A customised dataset - consisting of the brigdeconn/snow-mountain and sentences collected from Kangri translators was created. This was then split using the 80/20
55
+ split rule. The results were evaluated with 5000 steps. The model decreases the word error rate by 0.6% after the initial 1000 steps. The Validation Loss increases due to
56
+ more data being introduced.
57
+
58
+
59
  ### Training hyperparameters
60
 
61
  The following hyperparameters were used during training: