FredrikMoller commited on
Commit
c6ee0a4
·
1 Parent(s): d603bba
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: sv
3
+ ---
4
+
5
+ ## Swedish BERT models for sentiment analysis
6
+ [Recorded Future](https://www.recordedfuture.com/) together with [AI Sweden](https://www.ai.se/en) releases two language models for sentiment analysis in Swedish. The two models are based on the [KB\/bert-base-swedish-cased](https://huggingface.co/KB/bert-base-swedish-cased) model and has been fine-tuned to solve a multi-label sentiment analysis task.
7
+
8
+ The models have been fine-tuned for the sentiments fear and violence. The models output three floats corresponding to the labels "Negative", "Weak sentiment", and "Strong Sentiment" at the respective indexes.
9
+ The models have been trained on Swedish data with a conversational focus, collected from various internet sources and forums.
10
+
11
+ The models are only trained on Swedish data and only supports inference of Swedish input texts. The models inference metrics for all non-Swedish inputs are not defined, these inputs are considered as out of domain data.
12
+
13
+ The current models are supported at Transformers version >= 4.3.3 and Torch version 1.8.0, compatibility with older versions are not verified.
14
+
15
+ ### Swedish-Sentiment-Fear
16
+
17
+ The model can be imported from the transformers library by running
18
+
19
+ from transformers import BertForSequenceClassification, BertTokenizerFast
20
+
21
+ tokenizer = BertTokenizerFast.from_pretrained("fredrikmollerRF/Swedish-Sentiment-Fear")
22
+ classifier_fear= load_classifier("fredrikmollerRF/Swedish-Sentiment-Fear")
23
+
24
+ When the model and tokenizer are initialized the model can be used for inference.
25
+
26
+ #### Sentiment definitions
27
+ #### The strong sentiment includes but are not limited to
28
+ Texts that:
29
+
30
+ - Hold an expressive emphasis on fear and/ or anxiety
31
+
32
+ #### The weak sentiment includes but are not limited to
33
+ Texts that:
34
+
35
+ - Express fear and/ or anxiety in a neutral way
36
+
37
+ #### Verification metrics
38
+
39
+ During training, the model had maximized validation metrics at the following classification breakpoint.
40
+
41
+
42
+
43
+ | Classification Breakpoint | F-score | Precision | Recall |
44
+ |:-------------------------:|:-------:|:---------:|:------:|
45
+ | 0.45 | 0.8754 | 0.8618 | 0.8895 |
46
+
47
+ #### Swedish-Sentiment-Violence
48
+ The model be can imported from the transformers library by running
49
+
50
+ from transformers import BertForSequenceClassification, BertTokenizerFast
51
+
52
+ tokenizer = BertTokenizerFast.from_pretrained("fredrikmollerRF/Swedish-Sentiment-Violence")
53
+ classifier_violence = load_classifier("fredrikmollerRF/Swedish-Sentiment-Violence")
54
+
55
+ When the model and tokenizer are initialized the model can be used for inference.
56
+
57
+ ### Sentiment definitions
58
+ #### The strong sentiment includes but are not limited to
59
+ Texts that:
60
+ - Referencing highly violent acts
61
+ - Hold an aggressive tone
62
+ #### The weak sentiment includes but are not limited to
63
+ Texts that:
64
+ - Include general violent statements that do not fall under the strong sentiment
65
+ #### Verification metrics
66
+ During training, the model had maximized validation metrics at the following classification breakpoint.
67
+
68
+ | Classification Breakpoint | F-score | Precision | Recall |
69
+ |:-------------------------:|:-------:|:---------:|:------:|
70
+ | 0.35 | 0.7677 | 0.7456 | 0.791 |
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "fredrikmollerRF/Swedish-Sentiment-Fear",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "LABEL_0",
13
+ "1": "LABEL_1",
14
+ "2": "LABEL_2"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 3072,
18
+ "label2id": {
19
+ "LABEL_0": 0,
20
+ "LABEL_1": 1,
21
+ "LABEL_2": 2
22
+ },
23
+ "layer_norm_eps": 1e-12,
24
+ "max_position_embeddings": 512,
25
+ "model_type": "bert",
26
+ "num_attention_heads": 12,
27
+ "num_hidden_layers": 12,
28
+ "output_past": true,
29
+ "pad_token_id": 0,
30
+ "position_embedding_type": "absolute",
31
+ "transformers_version": "4.3.3",
32
+ "type_vocab_size": 2,
33
+ "use_cache": true,
34
+ "vocab_size": 50325
35
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:164a369b17d58a392d4aed97ab0e0aa4311a3008d79f39d70ecbbdbf731ce7be
3
+ size 498860296
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:170e42c9bb3bb5727d9c6f447e872fa3afe429a5b14bc6abca542c053355d55c
3
+ size 499043692
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"do_lower_case": false, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": false, "special_tokens_map_file": "C:\\Users\\Fredrik Möller/.cache\\huggingface\\transformers\\37f2eab7cd9b3716ce0160ea9562138ae9247fb3ea61a2fd0190b16d0970444e.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d", "name_or_path": "KB/bert-base-swedish-cased", "do_basic_tokenize": true, "never_split": null}
vocab.txt ADDED
The diff for this file is too large to render. See raw diff