cardiffnlp commited on
Commit
dafba98
β€’
1 Parent(s): 12c1452
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Twitter-roBERTa-base
2
+
3
+ This is a roBERTa-base model trained on ~58M tweets, described and evaluated in the [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). To evaluate this and other LMs on Twitter-specific data, please refer to the [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval).
4
+
5
+ ## Example Masked Language Model
6
+
7
+ ```python
8
+ from transformers import pipeline, AutoTokenizer
9
+ import numpy as np
10
+
11
+ MODEL = "cardiffnlp/twitter-roberta-base"
12
+ fill_mask = pipeline("fill-mask", model=MODEL, tokenizer=MODEL)
13
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
14
+
15
+ def print_candidates():
16
+ for i in range(5):
17
+ token = tokenizer.decode(candidates[i]['token'])
18
+ score = np.round(candidates[i]['score'], 4)
19
+ print(f"{i+1}) {token} {score}")
20
+
21
+ texts = [
22
+ "I am so <mask> 😊",
23
+ "I am so <mask> 😒"
24
+ ]
25
+ for text in texts:
26
+ print(f"{'-'*30}\n{text}")
27
+ candidates = fill_mask(text)
28
+ print_candidates()
29
+ ```
30
+
31
+ Output:
32
+
33
+ ```
34
+ ------------------------------
35
+ I am so <mask> 😊
36
+ 1) happy 0.402
37
+ 2) excited 0.1441
38
+ 3) proud 0.143
39
+ 4) grateful 0.0669
40
+ 5) blessed 0.0334
41
+ ------------------------------
42
+ I am so <mask> 😒
43
+ 1) sad 0.2641
44
+ 2) sorry 0.1605
45
+ 3) tired 0.138
46
+ 4) sick 0.0278
47
+ 5) hungry 0.0232
48
+ ```
49
+
50
+ ## Example Feature Extraction
51
+
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModel, TFAutoModel
54
+ import numpy as np
55
+
56
+ MODEL = "cardiffnlp/twitter-roberta-base"
57
+ text = "Good night 😊"
58
+
59
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
60
+
61
+ # Pytorch
62
+ encoded_input = tokenizer(text, return_tensors='pt')
63
+ model = AutoModel.from_pretrained(MODEL)
64
+ features = model(**encoded_input)
65
+ features = features[0].detach().cpu().numpy()
66
+ features_mean = np.mean(features[0], axis=0)
67
+ #features_max = np.max(features[0], axis=0)
68
+
69
+ # # Tensorflow
70
+ # encoded_input = tokenizer(text, return_tensors='tf')
71
+ # model = TFAutoModel.from_pretrained(MODEL)
72
+ # features = model(encoded_input)
73
+ # features = features[0].numpy()
74
+ # features_mean = np.mean(features[0], axis=0)
75
+ # #features_max = np.max(features[0], axis=0)
76
+
77
+ ```
README.md CHANGED
@@ -48,4 +48,30 @@ I am so <mask> 😒
48
  ```
49
 
50
  ## Example Feature Extraction
51
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ```
49
 
50
  ## Example Feature Extraction
51
+
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModel, TFAutoModel
54
+ import numpy as np
55
+
56
+ MODEL = "cardiffnlp/twitter-roberta-base"
57
+ text = "Good night 😊"
58
+
59
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
60
+
61
+ # Pytorch
62
+ encoded_input = tokenizer(text, return_tensors='pt')
63
+ model = AutoModel.from_pretrained(MODEL)
64
+ features = model(**encoded_input)
65
+ features = features[0].detach().cpu().numpy()
66
+ features_mean = np.mean(features[0], axis=0)
67
+ #features_max = np.max(features[0], axis=0)
68
+
69
+ # # Tensorflow
70
+ # encoded_input = tokenizer(text, return_tensors='tf')
71
+ # model = TFAutoModel.from_pretrained(MODEL)
72
+ # features = model(encoded_input)
73
+ # features = features[0].numpy()
74
+ # features_mean = np.mean(features[0], axis=0)
75
+ # #features_max = np.max(features[0], axis=0)
76
+
77
+ ```