Back to all models
fill-mask mask_token: <mask>
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint  

⚡️ Upgrade your account to access the Inference API

Share Copied link to clipboard

Monthly model downloads

deepampatel/roberta-mlm-mr deepampatel/roberta-mlm-mr
152 downloads
last 30 days

pytorch

tf

Contributed by

deepampatel Deepam Patel
1 model

How to use this model directly from the 🤗/transformers library:

			
Copy to clipboard
from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("deepampatel/roberta-mlm-mr") model = AutoModelForMaskedLM.from_pretrained("deepampatel/roberta-mlm-mr")
Uploaded in S3

Welcome to Roberta-Marathi-MLM

Model Description

This is a small language model for Marathi language with 1M data samples taken from OSCAR page

Training params

  • Dataset - 1M data samples are used to train this model from OSCAR page(https://oscar-corpus.com/) eventhough data set is of 2.7 GB due to resource constraint to train I have picked only 1M data from the total 2.7GB data set. If you are interested in collaboration and have computational resources to train on you are most welcome to do so.

  • Preprocessing - ByteLevelBPETokenizer is used to tokenize the sentences at character level and vocabulary size is set to 52k as per standard values given by 🤗

Intended uses & limitations this is for anyone who wants to make use of marathi language models for various tasks like language generation, translation and many more use cases.

Whatever else is helpful! If you are intersted in collaboration feel free to reach me Deepam