leks-forever
/

mt5-base

text2text-generation

Model card Files Files and versions Community

akirus commited on Oct 2, 2024

Commit

bab22ab

·

verified ·

1 Parent(s): 771bb0c

Update README.md

Files changed (1) hide show

README.md +1 -6

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ This version of the Google T5-Base model has been fine-tuned on a bilingual data
 ### Model Sources
 <!-- Provide the basic links for the model. -->
-- **Repository:** https://github.com/leks-forever/t5-tuning
 <!-- - **Paper [optional]:** [More Information Needed] -->
 <!-- - **Demo [optional]:** [More Information Needed] -->
@@ -82,11 +82,6 @@ print(translation)
 The model was fine-tuned on the [bible-lezghian-russian](https://huggingface.co/datasets/leks-forever/bible-lezghian-russian) dataset, which contains 13,800 parallel sentences in Russian and Lezgian. The dataset was split into three parts: 90% for training, 5% for validation, and 5% for testing.
-### Preprocessing
-The preprocessing step included tokenization with a custom-trained SentencePiece NLLB-based tokenizer on the Russian-Lezgian corpus.
 #### Training Hyperparameters

 ### Model Sources
 <!-- Provide the basic links for the model. -->
+- **Repository:** https://github.com/leks-forever/mt5-tuning
 <!-- - **Paper [optional]:** [More Information Needed] -->
 <!-- - **Demo [optional]:** [More Information Needed] -->
 The model was fine-tuned on the [bible-lezghian-russian](https://huggingface.co/datasets/leks-forever/bible-lezghian-russian) dataset, which contains 13,800 parallel sentences in Russian and Lezgian. The dataset was split into three parts: 90% for training, 5% for validation, and 5% for testing.
 #### Training Hyperparameters