ibraheemmoosa
/

xlmindic-base-uniscript

masked-language-modeling

sentence-order-prediction

transliteration

Inference Endpoints

Model card Files Files and versions Community

xlmindic-base-uniscript / README.md

ibraheemmoosa's picture

Add basic tags and description to the README

33426be almost 3 years ago

|

784 Bytes

	---
	language:
	- as
	- bn
	- gu
	- hi
	- mr
	- ne
	- or
	- pa
	- si
	license: apache-2.0
	datasets:
	- oscar
	tags:
	- multilingual
	- albert
	- masked-language-modeling
	- sentence-order-prediction
	- fill-mask
	- nlp
	---

	# XLMIndic Base Uniscript

	Pretrained ALBERT model on the OSCAR corpus on the languages Assamese, Bengali, Gujarati, Hindi, Marathi,
	Nepali, Oriya, Panjabi and Sinhala. Like ALBERT it was pretrained using as masked language modeling (MLM)
	and a sentence order prediction (SOP) objective. This model was pretrained after transliterating the text
	to ISO-15919 format using the Aksharamukha library. A demo of Aksharamukha library is hosted [here](https://aksharamukha.appspot.com/converter)
	where you can transliterate your text and use it on our model on the inference widget.