deepampatel
/

roberta-mlm-marathi

Inference Endpoints

Model card Files Files and versions Community

system HF staff commited on Oct 9, 2020

Commit

272b398

·

1 Parent(s): 0901777

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ language: "mr"
 - **Dataset** - 1M data samples are used to train this model from OSCAR page(https://oscar-corpus.com/) eventhough data set is of 2.7 GB due to resource constraint to train
 I have picked only 1M data from the total 2.7GB data set. If you are interested in collaboration and have computational resources to train on you are most welcome to do so.
-- **Preprocessing** - ByteLevelBPETokenizer is used to tokenize the sentences at character level and vocabulary size is set to 52k as per standard values given by 🤗
 <!-- - **Hyperparameters** - __ByteLevelBPETokenizer__ : vocabulary size = 52_000 and  min_frequency = 2
                         __Trainer__ :               num_train_epochs=12 - trained for 12 epochs
                                                     per_gpu_train_batch_size=64 - batch size for the datasamples is 64
@@ -25,4 +25,4 @@ I have picked only 1M data from the total 2.7GB data set. If you are interested
   this is for anyone who wants to make use of marathi language models for various tasks like language generation, translation and many more use cases.
 **Whatever else is helpful!**
-  If you are intersted in collaboration feel free to reach  me [Deepam](mailto:deepam8155@gmail.com)

 - **Dataset** - 1M data samples are used to train this model from OSCAR page(https://oscar-corpus.com/) eventhough data set is of 2.7 GB due to resource constraint to train
 I have picked only 1M data from the total 2.7GB data set. If you are interested in collaboration and have computational resources to train on you are most welcome to do so.
+- **Preprocessing** - ByteLevelBPETokenizer is used to tokenize the sentences at character level and vocabulary size is set to 52k as per standard values given by ðŸ¤—
 <!-- - **Hyperparameters** - __ByteLevelBPETokenizer__ : vocabulary size = 52_000 and  min_frequency = 2
                         __Trainer__ :               num_train_epochs=12 - trained for 12 epochs
                                                     per_gpu_train_batch_size=64 - batch size for the datasamples is 64
   this is for anyone who wants to make use of marathi language models for various tasks like language generation, translation and many more use cases.
 **Whatever else is helpful!**
+  If you are intersted in collaboration feel free to reach  me [Deepam](mailto:deepam8155@gmail.com)