File size: 427 Bytes
63653ca 5194c91 58da5f6 3f36a18 5194c91 3f36a18 5194c91 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
---
language:
- en
- hi
- multilingual
license: cc-by-sa-4.0
---
# en-hi-codemixed
This is a masked language model, based on the CamemBERT model architecture.
en-hi-codemixed model was trained from scratch on English, Hindi, and codemixed English-Hindi
corpora for 40 epochs.
The corpora used consists of primarily web crawled data, including codemixed tweets, and focuses on conversational
language and covid-19 pandemic.
|