Lingalingeswaran
commited on
Commit
•
36509f3
1
Parent(s):
52b1014
Update README.md
Browse files
README.md
CHANGED
@@ -40,15 +40,20 @@ It achieves the following results on the evaluation set:
|
|
40 |
|
41 |
## Model description
|
42 |
|
43 |
-
|
44 |
|
45 |
## Intended uses & limitations
|
|
|
|
|
46 |
|
47 |
-
|
|
|
|
|
|
|
48 |
|
49 |
## Training and evaluation data
|
50 |
|
51 |
-
|
52 |
|
53 |
## Training procedure
|
54 |
|
|
|
40 |
|
41 |
## Model description
|
42 |
|
43 |
+
This Whisper model has been fine-tuned specifically for the Tamil language using the Common Voice 11.0 dataset. It is designed to handle tasks such as speech-to-text transcription and language identification, making it suitable for applications where Tamil is a primary language of interest. The fine-tuning process focused on enhancing performance for Tamil, aiming to reduce the error rate in transcriptions and improve general accuracy.
|
44 |
|
45 |
## Intended uses & limitations
|
46 |
+
Intended Uses:
|
47 |
+
Speech-to-text transcription in Tamil
|
48 |
|
49 |
+
Limitations:
|
50 |
+
May not perform as well on languages or dialects that are not well-represented in the Common Voice dataset.
|
51 |
+
Higher Word Error Rate (WER) in noisy environments or with speakers who have heavy accents not covered in the training data.
|
52 |
+
The model is optimized for Tamil; performance in other languages may be suboptimal.
|
53 |
|
54 |
## Training and evaluation data
|
55 |
|
56 |
+
The training data for this model consists of voice recordings in Tamil from the Mozilla-foundation/Common Voice 11.0 dataset. The dataset is a crowd-sourced collection of transcribed speech, ensuring diversity in terms of speaker accents, age groups, and speech styles.
|
57 |
|
58 |
## Training procedure
|
59 |
|