asahi417 commited on
Commit
ad1b519
1 Parent(s): 255ed1d

model update

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - cardiffnlp/tweet_topic_multi
4
+ metrics:
5
+ - f1
6
+ - accuracy
7
+ model-index:
8
+ - name: cardiffnlp/roberta-large-tweet-topic-multi-2020
9
+ results:
10
+ - task:
11
+ type: text-classification
12
+ name: Text Classification
13
+ dataset:
14
+ name: cardiffnlp/tweet_topic_multi
15
+ type: cardiffnlp/tweet_topic_multi
16
+ args: cardiffnlp/tweet_topic_multi
17
+ split: test_2021
18
+ metrics:
19
+ - name: F1
20
+ type: f1
21
+ value: 0.7323655694132079
22
+ - name: F1 (macro)
23
+ type: f1_macro
24
+ value: 0.5794562917377284
25
+ - name: Accuracy
26
+ type: accuracy
27
+ value: 0.4937462775461584
28
+ pipeline_tag: text-classification
29
+ widget:
30
+ - text: "I'm sure the {@Tampa Bay Lightning@} would’ve rather faced the Flyers but man does their experience versus the Blue Jackets this year and last help them a lot versus this Islanders team. Another meat grinder upcoming for the good guys"
31
+ example_title: "Example 1"
32
+ - text: "Love to take night time bike rides at the jersey shore. Seaside Heights boardwalk. Beautiful weather. Wishing everyone a safe Labor Day weekend in the US."
33
+ example_title: "Example 2"
34
+ ---
35
+ # cardiffnlp/roberta-large-tweet-topic-multi-2020
36
+
37
+ This model is a fine-tuned version of [roberta-large](https://huggingface.co/roberta-large) on the [tweet_topic_multi](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi). This model is fine-tuned on `train_2020` split and validated on `test_2021` split of tweet_topic.
38
+ Fine-tuning script can be found [here](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi/blob/main/lm_finetuning.py). It achieves the following results on the test_2021 set:
39
+
40
+ - F1 (micro): 0.7323655694132079
41
+ - F1 (macro): 0.5794562917377284
42
+ - Accuracy: 0.4937462775461584
43
+
44
+
45
+ ### Usage
46
+
47
+ ```python
48
+ import math
49
+ import torch
50
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
51
+
52
+ def sigmoid(x):
53
+ return 1 / (1 + math.exp(-x))
54
+
55
+ tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/roberta-large-tweet-topic-multi-2020")
56
+ model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/roberta-large-tweet-topic-multi-2020", problem_type="multi_label_classification")
57
+ model.eval()
58
+ class_mapping = model.config.id2label
59
+
60
+ with torch.no_grad():
61
+ text = #NewVideo Cray Dollas- Water- Ft. Charlie Rose- (Official Music Video)- {{URL}} via {@YouTube@} #watchandlearn {{USERNAME}}
62
+ tokens = tokenizer(text, return_tensors='pt')
63
+ output = model(**tokens)
64
+ flags = [sigmoid(s) > 0.5 for s in output[0][0].detach().tolist()]
65
+ topic = [class_mapping[n] for n, i in enumerate(flags) if i]
66
+ print(topic)
67
+ ```
68
+
69
+ ### Reference
70
+
71
+ ```
72
+
73
+ @inproceedings{dimosthenis-etal-2022-twitter,
74
+ title = "{T}witter {T}opic {C}lassification",
75
+ author = "Antypas, Dimosthenis and
76
+ Ushio, Asahi and
77
+ Camacho-Collados, Jose and
78
+ Neves, Leonardo and
79
+ Silva, Vitor and
80
+ Barbieri, Francesco",
81
+ booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
82
+ month = oct,
83
+ year = "2022",
84
+ address = "Gyeongju, Republic of Korea",
85
+ publisher = "International Committee on Computational Linguistics"
86
+ }
87
+
88
+ ```