SchuylerH commited on
Commit
3bcb94e
1 Parent(s): c7e321d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -14
README.md CHANGED
@@ -19,31 +19,57 @@ widget:
19
  license: apache-2.0
20
  ---
21
 
22
- # bert-multilingual-go-emtions
23
 
24
- This a bert-base-multilingual-cased model finetuned for sentiment analysis in two languages: English, Simplified Chinese.
25
 
26
- This cross-language model is intended for direct use as a sentiment analysis model for text in any of the two languages above, or for further finetuning on related sentiment analysis tasks.
 
 
 
 
 
 
 
 
 
27
 
28
  ## Training data
29
 
30
- Here is the number of product reviews we used for finetuning the model:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- | Language | Number of reviews |
33
- | -------- | ----------------- |
34
- | English | |
35
- | Chinese(Machine Translate) | |
36
 
 
 
 
37
 
38
- ## Accuracy
 
 
39
 
40
- The finetuned model obtained the following accuracy on - in each of the languages:
 
41
 
42
- | Language | Accuracy
43
- | -------- | ---------------------- |
44
- | English | -% |
45
- | Chinese | -% |
46
 
 
47
 
48
  ## Contact
49
 
 
19
  license: apache-2.0
20
  ---
21
 
22
+ # Multilingual (English and Chinese) GoEmotions Classification Model
23
 
24
+ This repository hosts a fine-tuned BERT model for cross-language emotion classification on the GoEmotions dataset. This model is unique as it has been trained on a multilingual dataset comprising of English and Chinese texts. It is capable of classifying text into one of 28 different emotion categories.
25
 
26
+ The 28 emotion categories, according to the GoEmotions taxonomy, are: 'admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', and 'neutral'.
27
+
28
+ # Model Performance
29
+ The model demonstrates high performance on the validation set, with the following scores:
30
+
31
+ Accuracy: 85.95%
32
+ Precision: 91.99%
33
+ Recall: 89.56%
34
+ F1 Score: 90.17%
35
+ These results indicate the model's high accuracy and precision in predicting the correct emotion category for a given input text, regardless of the language (English or Chinese).
36
 
37
  ## Training data
38
 
39
+ The dataset used for training the model is a combined dataset of the original English GoEmotions dataset and a machine translated Chinese version of the GoEmotions dataset.
40
+
41
+ The dataset is split into two parts:
42
+
43
+ - **Labeled data**: Used for initial training. It includes both English and machine translated Chinese samples. This labeled data is further split into a training set (80%) and a validation set (20%).
44
+ - **Unlabeled data**: Used for making predictions and adding confidently predicted samples to the training data. It includes both English and machine translated Chinese samples.
45
+
46
+ ## Training
47
+
48
+ The model is trained in two stages:
49
+
50
+ 1. Initial training on the labeled data.
51
+ 2. Predictions are made on the unlabeled data, and the most confidently predicted samples are added to the training data. The model is then retrained on this updated labeled data.
52
+
53
+ The model is trained for a total of 20 epochs (10 epochs for each stage). Precision, recall, and F1 score are logged during training.
54
+
55
+ ## Usage
56
 
57
+ Here is a code snippet showing how to use this model:
 
 
 
58
 
59
+ ```python
60
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
61
+ from transformers import pipeline
62
 
63
+ # Load the model and tokenizer
64
+ tokenizer = AutoTokenizer.from_pretrained("SchuylerH/bert-multilingual-go-emtions")
65
+ model = AutoModelForSequenceClassification.from_pretrained("SchuylerH/bert-multilingual-go-emtions")
66
 
67
+ text = "I love you."
68
+ nlp = pipeline("sentiment-analysis", model = model, tokenizer = tokenizer)
69
 
70
+ result = nlp(text)
 
 
 
71
 
72
+ print(result)
73
 
74
  ## Contact
75