burakaytan
/

roberta-base-turkish-uncased

Inference Endpoints

Model card Files Files and versions Community

roberta-base-turkish-uncased / README.md

burakaytan's picture

Update README.md

e14811b over 2 years ago

|

666 Bytes

	---
	license: mit
	---
	## Model description
	This is a Turkish RoBERTa base model pretrained on Turkish Wikipedia, Turkish OSCAR, and some news websites.

	The final training corpus has a size of 38 GB and 329.720.508 sentences.

	Thanks to Turkcell we could train the model on Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz 256GB RAM 2 x GV100GL [Tesla V100 PCIe 32GB] GPU for 2.5M steps.

	# Usage
	Load transformers library with:
	```
	from transformers import AutoTokenizer, AutoModelForMaskedLM

	tokenizer = AutoTokenizer.from_pretrained("burakaytan/roberta-base-turkish-uncased")
	model = AutoModelForMaskedLM.from_pretrained("burakaytan/roberta-base-turkish-uncased")
	```