File size: 1,234 Bytes
ea8a5bf
 
8f98587
 
 
 
 
 
ea8a5bf
 
 
27f6d76
ea8a5bf
8f98587
ea8a5bf
8f98587
ea8a5bf
8f98587
ea8a5bf
8f98587
ea8a5bf
8f98587
ea8a5bf
8f98587
ea8a5bf
8f98587
 
 
ea8a5bf
8f98587
 
ea8a5bf
8f98587
 
ea8a5bf
8f98587
 
ea8a5bf
8f98587
 
 
ea8a5bf
8f98587
 
ea8a5bf
8f98587
b2cb0a3
ea8a5bf
b2cb0a3
8f98587
69302df
 
 
8f98587
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
library_name: transformers
tags:
- roberta
datasets:
- pubmed
language:
- en
---
# Model Card for Model ID

base_model : [FacebookAI/xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large)

hidden_size : 1024

max_position_embeddings : 512

num_attention_heads : 16

num_hidden_layers : 24

vocab_size : 250002

# Basic usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import numpy as np

# match tag
id2tag = {0:'O', 1:'B_MT', 2:'I_MT'}

# load model & tokenizer
MODEL_NAME = 'MDDDDR/roberta_large_NER'

model = AutoModelForTokenClassification.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# prepare input
text = 'mental disorder can also contribute to the development of diabetes through various mechanism including increased stress, poor self care behavior, and adverse effect on glucose metabolism.'
tokenized = tokenizer(text, return_tensors='pt')

# forward pass
output = model(**tokenized)

# result
pred = np.argmax(output[0].cpu().detach().numpy(), axis=2)[0][1:-1]

# check pred
for txt, pred in zip(tokenizer.tokenize(text), pred):
    print("{}\t{}".format(id2tag[pred], txt))
    # B_MT	▁mental 
    # B_MT	▁disorder
```