|
--- |
|
tags: |
|
- generated_from_trainer |
|
language: |
|
- en |
|
- hi |
|
licence: cc-by-sa-4.0 |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# muril-en-hi-codemixed |
|
|
|
muril-en-hi-codemixed is a masked language model, based on the [MuRIL](https://huggingface.co/google/muril-base-cased) multilingual model. |
|
|
|
muril-en-hi-codemixed replaces the tokenizer, vocabulary and the embeddings layer of the MuRIL model. |
|
The tokenizer and vocabulary used are the same as in the [roberta-en-hi-codemixed](https://huggingface.co/cjvt/roberta-en-hi-codemixed) model. |
|
The new embedding weights were initialized from the MuRIL embeddings. |
|
|
|
The new muril-en-hi-codemixed model was further pre-trained for two epochs on the same codemixed English and Hindi corpora |
|
as the [roberta-en-hi-codemixed](https://huggingface.co/cjvt/roberta-en-hi-codemixed) model. |
|
|