# rohanrajpal /bert-base-multilingual-codemixed-cased-sentiment

2.59 kB
 1 --- 2 language: 3 - hi 4 - en 5 tags: 6 - hi 7 - en 8 - codemix 9 license: "apache-2.0" 10 datasets: 11 - SAIL 2017 12 metrics: 13 - fscore 14 - accuracy 15 --- 16 17 # BERT codemixed base model for hinglish (cased) 18 19 ## Model description 20 21 Input for the model: Any codemixed hinglish text 22 Output for the model: Sentiment. (0 - Negative, 1 - Neutral, 2 - Positive) 23 24 I took a bert-base-multilingual-cased model from Huggingface and finetuned it on [SAIL 2017](http://www.dasdipankar.com/SAILCodeMixed.html) dataset.  25 26 Performance of this model on the SAIL 2017 dataset 27 28 | metric | score | 29 |------------|----------| 30 | acc | 0.588889 | 31 | f1 | 0.582678 | 32 | acc_and_f1 | 0.585783 | 33 | precision | 0.586516 | 34 | recall | 0.588889 | 35 36 ## Intended uses & limitations 37 38 #### How to use 39 40 Here is how to use this model to get the features of a given text in *PyTorch*: 41 42 python 43 # You can include sample code which will be formatted 44 from transformers import BertTokenizer, BertModelForSequenceClassification 45 tokenizer = AutoTokenizer.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment") 46 model = AutoModelForSequenceClassification.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment") 47 text = "Replace me by any text you'd like." 48 encoded_input = tokenizer(text, return_tensors='pt') 49 output = model(**encoded_input) 50  51 52 and in *TensorFlow*: 53 54 python 55 from transformers import BertTokenizer, TFBertModel 56 tokenizer = BertTokenizer.from_pretrained('rohanrajpal/bert-base-codemixed-uncased-sentiment') 57 model = TFBertModel.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment") 58 text = "Replace me by any text you'd like." 59 encoded_input = tokenizer(text, return_tensors='tf') 60 output = model(encoded_input) 61  62 63 #### Limitations and bias 64 65 Coming soon! 66 67 ## Training data 68 69 I trained on the SAIL 2017 dataset [link](http://amitavadas.com/SAIL/Data/SAIL_2017.zip) on this [pretrained model](https://huggingface.co/bert-base-multilingual-cased). 70 71 ## Training procedure 72 73 No preprocessing. 74 75 ## Eval results 76 77 ### BibTeX entry and citation info 78 79 bibtex 80 @inproceedings{khanuja-etal-2020-gluecos, 81  title = "{GLUEC}o{S}: An Evaluation Benchmark for Code-Switched {NLP}", 82  author = "Khanuja, Simran and 83  Dandapat, Sandipan and 84  Srinivasan, Anirudh and 85  Sitaram, Sunayana and 86  Choudhury, Monojit", 87  booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", 88  month = jul, 89  year = "2020", 90  address = "Online", 91  publisher = "Association for Computational Linguistics", 92  url = "https://www.aclweb.org/anthology/2020.acl-main.329", 93  pages = "3575--3585" 94 } 95  96