--- license: mit --- ## Model description This is a Turkish RoBERTa base model pretrained on Turkish Wikipedia, Turkish OSCAR, and some news websites. The final training corpus has a size of 38 GB and 329.720.508 sentences. Thanks to Turkcell we could train the model on Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz 256GB RAM 2 x GV100GL [Tesla V100 PCIe 32GB] GPU for 2.5M steps. # Usage Load transformers library with: ``` from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("burakaytan/roberta-base-turkish-uncased") model = AutoModelForMaskedLM.from_pretrained("burakaytan/roberta-base-turkish-uncased") ```