cjvt
/

lbourdois's picture
Add multilingual to the language tag
1d861c0
|
raw
history blame
427 Bytes
metadata
language:
  - en
  - hi
  - multilingual
license: cc-by-sa-4.0

en-hi-codemixed

This is a masked language model, based on the CamemBERT model architecture. en-hi-codemixed model was trained from scratch on English, Hindi, and codemixed English-Hindi corpora for 40 epochs. The corpora used consists of primarily web crawled data, including codemixed tweets, and focuses on conversational language and covid-19 pandemic.