File size: 1,130 Bytes
3ba8afb
 
 
 
 
 
 
7e514c1
3ba8afb
3d43a5f
c3aa13d
 
6ad6969
3ba8afb
87b9a00
d4fb737
f6aef03
d3839e9
6591fb8
97edb38
 
6591fb8
 
 
 
 
 
cb41fcd
6591fb8
cb41fcd
6591fb8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
language: "en"
tags:
- buy-intent
- sell-intent
- consumer-intent
widget:
- text: "Flutoprazepam (Restas) is a drug which is a benzodiazepine. It was patented in Japan by Sumitomo."
---
# Chemical vs Pharmaceutical Domain Document Classifier
Chemical domain language model finetuned on 13K Chemical, and 14K Pharma Wikipedia articles broken into paragraphs.

| Train Loss    | Validation Acc. | Test Acc.|
| ------------- |:-------------: | -----:   |
| 0.17      | 0.928  | 0.927    |
# Dataset
Dataset with splits can be found @ [https://www.kaggle.com/shahrukhkhan/pharma-vs-chemicals-domain-classification](https://www.kaggle.com/shahrukhkhan/pharma-vs-chemicals-domain-classification)

# Label Mappings
LABEL_0 => **"PHARMACEUTICAL"** <br/>
LABEL_1 => **"CHEMICAL"**

## Usage in Transformers

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
  
tokenizer = AutoTokenizer.from_pretrained("recobo/chemical-bert-uncased-pharmaceutical-chemical-classifier")

model = AutoModelForSequenceClassification.from_pretrained("recobo/chemical-bert-uncased-pharmaceutical-chemical-classifier")
```