Andrija
/

SRoBERTa-XL

Inference Endpoints

Model card Files Files and versions Community

SRoBERTa-XL / README.md

Andrija's picture

Add "multilingual" to the language tag (#1)

98ef316 about 2 years ago

|

756 Bytes

	---
	datasets:
	- oscar
	- srwac
	- leipzig
	- cc100
	- hrwac
	language:
	- hr
	- sr
	- multilingual
	tags:
	- masked-lm
	widget:
	- text: "Ovo je početak <mask>."
	license: apache-2.0

	---

	# Transformer language model for Croatian and Serbian

	Trained on 28GB datasets that contain Croatian and Serbian language for one epochs (3 mil. steps).
	Leipzig Corpus, OSCAR, srWac, hrWac, cc100-hr and cc100-sr datasets

	\| Model \| #params \| Arch. \| Training data \|
	\|--------------------------------\|--------------------------------\|-------\|-----------------------------------\|
	\| `Andrija/SRoBERTa-XL` \| 80M \| Forth \| Leipzig Corpus, OSCAR, srWac, hrWac, cc100-hr and cc100-sr (28 GB of text) \|