|
--- |
|
license: apache-2.0 |
|
language: |
|
- hi |
|
- en |
|
--- |
|
|
|
This is the pytorch model parameters and associated data used for training a small transformer model from scratch. |
|
The transformer model is used to train for translation from hindi_latin to english. |
|
|
|
Among the files, training dataset used to create the model is also there. Data used for training is semi-synthetic. |
|
|
|
Steps for creating datasets: |
|
Obtain actualuser questions in hindi and human translations thereof in english. |
|
Prompt GPT to create variations of key words taking phonetics in account and giving a user persona. |
|
|