File size: 4,119 Bytes
ec70415
 
4d41220
 
 
 
 
 
 
ec70415
bf8e6c7
 
 
 
 
 
 
849e44a
bf8e6c7
 
 
 
 
 
 
 
cb0e2fd
a38f80c
cb0e2fd
a38f80c
 
 
 
 
 
 
 
 
 
17efe20
 
c7242bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17efe20
 
 
 
 
849e44a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17efe20
a38f80c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: mit
language:
- en
tags:
- biology
- medical
- language models
- BioNLP
---

In-BoXBART
=============

An instruction-based unified model for performing various biomedical tasks.

You may want to check out 
* Our paper (NAACL 2022 Findings): [In-BoXBART: Get Instructions into Biomedical Multi-Task Learning](https://aclanthology.org/2022.findings-naacl.10/)
* GitHub: [Click Here](https://github.com/Mihir3009/In-BoXBART)

This work explores the impact of instructional prompts on biomedical Multi-Task Learning. We introduce the BoX, a collection of 32 instruction tasks for Biomedical NLP across (X) various categories. Using this meta-dataset, we propose a unified model termed In-BoXBART, that can jointly learn all tasks of the BoX without any task-specific modules. To the best of our knowledge, this is the first attempt to
propose a unified model in the biomedical domain and use instructions to achieve generalization across several biomedical tasks.

How to Use
=============

You can very easily load the models with Transformers, instead of downloading them manually. The BART-base model is the backbone of our model. Here is how to use the model in PyTorch:

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("cogint/in-boxbart")

model = AutoModelForSeq2SeqLM.from_pretrained("cogint/in-boxbart")
```
Or just clone the model repo
```
git lfs install
git clone https://huggingface.co/cogint/in-boxbart
```

Inference Example
=============

Here, we provide an example for the "Document Classification" (HoC dataset) task. Once you load the model from huggigface for inference, you can append instruction given in `./templates` for that particular dataset with input instance. Below is an example of one instance.

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("cogint/in-boxbart")
model = AutoModelForSeq2SeqLM.from_pretrained("cogint/in-boxbart")

# Input shows how we have appended instruction from our file for HoC dataset with instance.

input = "Instruction: Definition: In this task, you are given a medical text related to cancer. Your job is to classify into zero or more classes from (1) Sustaining proliferative signaling, (2) Resisting cell death, (3) Genomic instability and mutation, (4) Activating invasion and metastasis, (5) Tumor promoting inflammation, (6) Evading growth suppressors, (7) Inducing angiogenesis (8) Enabling replicative immortality, (9) Avoiding immune destruction and (10) Cellular energetics., Positive Examples: [[input: Studies of cell-cycle progression showed that the anti-proliferative effect of Fan was associated with an increase in the G1/S phase of PC3 cells ., output: Evading growth suppressors, Sustaining proliferative signaling, explanation: Given text is classified into two categories, hence, generated label is 'Evading growth suppressors, Sustaining proliferative signaling'.] ]; Instance: input: Similar to previous studies utilizing IGF-1 , pretreatment with Roscovitine leads to a significant up-regulation of p21 expression and a significant decrease in the number of PCNA positive cells ., output: ?"

tokenized_input= tokenizer(input)

# Ideal output for this input is 'Sustaining proliferative signaling'

output = model(tokenized_input)
```


BibTeX Entry and Citation Info
===============
If you are using our model, please cite our paper:

```bibtex
@inproceedings{parmar-etal-2022-boxbart,
    title = "In-{B}o{XBART}: Get Instructions into Biomedical Multi-Task Learning",
    author = "Parmar, Mihir  and
      Mishra, Swaroop  and
      Purohit, Mirali  and
      Luo, Man  and
      Mohammad, Murad  and
      Baral, Chitta",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-naacl.10",
    doi = "10.18653/v1/2022.findings-naacl.10",
    pages = "112--128",
}
```