cjvt
/

File size: 427 Bytes
63653ca
5194c91
 
 
58da5f6
3f36a18
5194c91
 
3f36a18
5194c91
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
---
language:
- en
- hi
- multilingual
license: cc-by-sa-4.0
---


# en-hi-codemixed

This is a masked language model, based on the CamemBERT model architecture. 
en-hi-codemixed model was trained from scratch on English, Hindi, and codemixed English-Hindi
corpora for 40 epochs.
The corpora used consists of primarily web crawled data, including codemixed tweets, and focuses on conversational
language and covid-19 pandemic.