matejklemen's picture
Add multilingual to the language tag (#1)
57d4493
metadata
language:
  - en
  - hi
  - multilingual
tags:
  - generated_from_trainer
licence: cc-by-sa-4.0

muril-en-hi-codemixed

muril-en-hi-codemixed is a masked language model, based on the MuRIL multilingual model.

muril-en-hi-codemixed replaces the tokenizer, vocabulary and the embeddings layer of the MuRIL model. The tokenizer and vocabulary used are the same as in the roberta-en-hi-codemixed model. The new embedding weights were initialized from the MuRIL embeddings.

The new muril-en-hi-codemixed model was further pre-trained for two epochs on the same codemixed English and Hindi corpora as the roberta-en-hi-codemixed model.