antypasd commited on
Commit
0ff86a9
1 Parent(s): 8584991

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -29
README.md CHANGED
@@ -1,46 +1,70 @@
1
- ---
2
- tags:
3
- - generated_from_keras_callback
4
- model-index:
5
- - name: tweet-topic-latest-single
6
- results: []
7
- ---
8
-
9
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
- probably proofread and complete it, then remove this comment. -->
11
-
12
  # tweet-topic-latest-single
13
 
14
- This model was trained from scratch on an unknown dataset.
15
- It achieves the following results on the evaluation set:
 
 
 
16
 
17
 
18
- ## Model description
 
 
 
 
 
 
19
 
20
- More information needed
21
 
22
- ## Intended uses & limitations
23
 
24
- More information needed
 
 
 
 
25
 
26
- ## Training and evaluation data
 
 
27
 
28
- More information needed
 
 
29
 
30
- ## Training procedure
 
 
31
 
32
- ### Training hyperparameters
 
33
 
34
- The following hyperparameters were used during training:
35
- - optimizer: None
36
- - training_precision: float32
 
 
 
 
 
37
 
38
- ### Training results
39
 
 
 
 
 
 
 
40
 
 
41
 
42
- ### Framework versions
43
 
44
- - Transformers 4.23.1
45
- - TensorFlow 2.10.0
46
- - Tokenizers 0.13.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # tweet-topic-latest-single
2
 
3
+ This is a RoBERTa-base model trained on 168.86M tweets until the end of September 2022 and finetuned for single-label topic classification on a corpus of 6,997 [tweets](https://huggingface.co/datasets/cardiffnlp/tweet_topic_single).
4
+ The original RoBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-sep2022). This model is suitable for English.
5
+
6
+ - Reference Papers: [TimeLMs paper](https://arxiv.org/abs/2202.03829), [TweetTopic](https://arxiv.org/abs/2209.09824)
7
+ - Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
8
 
9
 
10
+ <b>Labels</b>:
11
+ - 0 -> arts_&_culture;
12
+ - 1 -> business_&_entrepreneurs;
13
+ - 2 -> pop_culture;
14
+ - 3 -> daily_life;
15
+ - 4 -> sports_&_gaming;
16
+ - 5 -> science_&_technology
17
 
 
18
 
19
+ ## Full classification example
20
 
21
+ ```python
22
+ from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
23
+ from transformers import AutoTokenizer
24
+ import numpy as np
25
+ from scipy.special import softmax
26
 
27
+
28
+ MODEL = f"cardiffnlp/tweet-topic-latest-single"
29
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
30
 
31
+ # PT
32
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
33
+ class_mapping = model.config.id2label
34
 
35
+ text = "Tesla stock is on the rise!"
36
+ encoded_input = tokenizer(text, return_tensors='pt')
37
+ output = model(**encoded_input)
38
 
39
+ scores = output[0][0].detach().numpy()
40
+ scores = softmax(scores)
41
 
42
+ # TF
43
+ #model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
44
+ #class_mapping = model.config.id2label
45
+ #text = "Tesla stock is on the rise!"
46
+ #encoded_input = tokenizer(text, return_tensors='tf')
47
+ #output = model(**encoded_input)
48
+ #scores = output[0][0]
49
+ #scores = softmax(scores)
50
 
 
51
 
52
+ ranking = np.argsort(scores)
53
+ ranking = ranking[::-1]
54
+ for i in range(scores.shape[0]):
55
+ l = class_mapping[ranking[i]]
56
+ s = scores[ranking[i]]
57
+ print(f"{i+1}) {l} {np.round(float(s), 4)}")
58
 
59
+ ```
60
 
61
+ Output:
62
 
63
+ ```
64
+ 1) business_&_entrepreneurs 0.8929
65
+ 2) sports_&_gaming 0.0478
66
+ 3) science_&_technology 0.0185
67
+ 4) daily_life 0.0178
68
+ 5) arts_&_culture 0.0128
69
+ 6) pop_culture 0.0102
70
+ ```