yacht commited on
Commit
f7e3ab1
·
verified ·
1 Parent(s): fe2d12d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +48 -12
README.md CHANGED
@@ -1,15 +1,15 @@
1
  ---
2
  language:
3
- - en
4
- - th
5
  tags:
6
- - transliteration
7
- - thai
8
- - english
9
- - byt5
10
  license: mit
11
  datasets:
12
- - custom
13
  library_name: transformers
14
  pipeline_tag: text2text-generation
15
  ---
@@ -25,7 +25,7 @@ This model is based on ByT5, a token-free sequence-to-sequence model that operat
25
  - **Developed by**: Thai NLP Research Group
26
  - **Model type**: ByT5 (Sequence-to-Sequence)
27
  - **Language(s)**: English → Thai
28
- - **License**: MIT
29
 
30
  ## Intended Uses & Limitations
31
 
@@ -53,13 +53,15 @@ The model was trained on a custom dataset of English-Thai transliteration pairs.
53
  - **Training hyperparameters**:
54
  - Learning rate: `2e-4`
55
  - Batch size: `16`
56
- - Number of epochs: `20`
57
  - Optimizer: `AdamW`
 
 
58
 
59
  ### Evaluation Results
60
- - **Accuracy**: `0.78`
61
- - **Character Error Rate**: `0.12`
62
- - **Mean Levenshtein Distance**: `1.45`
63
 
64
  ## How to Use
65
 
@@ -87,6 +89,9 @@ print(f"English: {english_text} → Thai: {thai_text}")
87
  | computer | คอมพิวเตอร์ |
88
  | thailand | ไทยแลนด์ |
89
  | bangkok | แบงค็อก |
 
 
 
90
 
91
  ## Limitations and Bias
92
 
@@ -96,3 +101,34 @@ The model's performance may vary based on:
96
  - The phonetic complexity of the input
97
  - Whether the input contains sounds that are difficult to represent in Thai
98
  - The coverage of similar words in the training data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
+ - en
4
+ - th
5
  tags:
6
+ - transliteration
7
+ - thai
8
+ - english
9
+ - byt5
10
  license: mit
11
  datasets:
12
+ - custom
13
  library_name: transformers
14
  pipeline_tag: text2text-generation
15
  ---
 
25
  - **Developed by**: Thai NLP Research Group
26
  - **Model type**: ByT5 (Sequence-to-Sequence)
27
  - **Language(s)**: English → Thai
28
+ - **License**: MIT (free for commercial use)
29
 
30
  ## Intended Uses & Limitations
31
 
 
53
  - **Training hyperparameters**:
54
  - Learning rate: `2e-4`
55
  - Batch size: `16`
56
+ - Number of epochs: `10`
57
  - Optimizer: `AdamW`
58
+ - Mixed precision: FP16
59
+ - Gradient clipping: Yes (max_grad_norm=1.0)
60
 
61
  ### Evaluation Results
62
+ - **Accuracy**: `0.7709`
63
+ - **Character Error Rate**: `0.0597`
64
+ - **Mean Levenshtein Distance**: `0.4708`
65
 
66
  ## How to Use
67
 
 
89
  | computer | คอมพิวเตอร์ |
90
  | thailand | ไทยแลนด์ |
91
  | bangkok | แบงค็อก |
92
+ | graph | กราฟ |
93
+ | grossular | กรอสซูลาร์ |
94
+ | grossularite | กรอสซูลาไรต์ |
95
 
96
  ## Limitations and Bias
97
 
 
101
  - The phonetic complexity of the input
102
  - Whether the input contains sounds that are difficult to represent in Thai
103
  - The coverage of similar words in the training data
104
+
105
+ ## Common Errors
106
+
107
+ Some common error patterns observed:
108
+ - group → กรุ๊ป (should be: กรูป)
109
+ - glaucochroite → กลอโคครอยต์ (should be: กลอโคโครไอต์)
110
+ - glasgow → กลาสโกว์ (should be: กลาสโกว)
111
+
112
+ ## License
113
+
114
+ MIT License
115
+
116
+ Copyright (c) 2023 Thai NLP Research Group
117
+
118
+ Permission is hereby granted, free of charge, to any person obtaining a copy
119
+ of this software and associated documentation files (the "Software"), to deal
120
+ in the Software without restriction, including without limitation the rights
121
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
122
+ copies of the Software, and to permit persons to whom the Software is
123
+ furnished to do so, subject to the following conditions:
124
+
125
+ The above copyright notice and this permission notice shall be included in all
126
+ copies or substantial portions of the Software.
127
+
128
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
129
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
130
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
131
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
132
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
133
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
134
+ SOFTWARE.