arazd commited on
Commit
0a08038
1 Parent(s): add5bb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -17,4 +17,26 @@ from transformers import BertForSequenceClassification, AutoTokenizer
17
  mpath = 'arazd/miread'
18
  model_hub = BertForSequenceClassification.from_pretrained(mpath)
19
  tokenizer = AutoTokenizer.from_pretrained(mpath)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ```
 
17
  mpath = 'arazd/miread'
18
  model_hub = BertForSequenceClassification.from_pretrained(mpath)
19
  tokenizer = AutoTokenizer.from_pretrained(mpath)
20
+ ```
21
+
22
+ To use MIReAD for feature extraction and classification:
23
+ ```python
24
+ # sample abstract text
25
+ abstr = 'Learning semantically meaningful representations from scientific documents can ...'
26
+ source_len = 512
27
+ inputs = tokenizer(abstr,
28
+ max_length = source_len,
29
+ pad_to_max_length=True,
30
+ truncation=True,
31
+ return_tensors="pt")
32
+
33
+ # classification (getting logits over 2,734 journal classes)
34
+ out = model(**inputs)
35
+ logits = out.logits
36
+
37
+ # feature extraction (getting 768-dimensional feature profiles)
38
+ out = model.bert(**inputs)
39
+ # IMPORTANT: use [CLS] token representation as document-level representation (hence, 0th idx)
40
+ feature = out.last_hidden_state[:, 0, :]
41
+
42
  ```