File size: 2,426 Bytes
de43994 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
## π Introduction
**Instruction-Tagger** is a powerful model for labeling instructions with task tags. It allows users to easily adjust the proportion of tasks in a dataset.
#### Example Input
>What are the main differences between Type 1 and Type 2 diabetes, and how do their treatment approaches differ?"
#### Example Output
>Medicine
## π Quick Start
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
```python
import torch
from transformers import DebertaV2Tokenizer,DebertaV2ForSequenceClassification, Trainer, TrainingArguments
model = DebertaV2ForSequenceClassification.from_pretrained('deberta_cls', num_labels=33).cuda()
tokenizer = DebertaV2Tokenizer.from_pretrained('alibaba-pai/Instruction-Tagger')
labels={14: 'Writting',
0: 'Common-Sense',
28: 'Ecology',
22: 'Medicine',
17: 'Grammar',
3: 'Code Generation',
31: 'Others',
20: 'Paraphrase',
19: 'Economy',
6: 'Code Debug',
21: 'Reasoning',
18: 'Computer Science',
4: 'Technology',
13: 'Math',
32: 'Literature',
26: 'Chemistry',
15: 'Complex Format',
25: 'Ethics',
27: 'Multilingual',
29: 'Roleplay',
30: 'Entertainment',
23: 'Biology',
16: 'Art',
10: 'Academic Writing',
24: 'Health',
11: 'Philosophy',
5: 'Sport',
1: 'History',
12: 'Music',
7: 'Toxicity',
2: 'Law',
9: 'Physics',
8: 'Counterfactual'}
def task_cls(pp):
inputs = tokenizer(pp, return_tensors="pt",padding=True).to("cuda")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
return labels[predicted_class_id]
instruct="""
What are the main differences between Type 1 and Type 2 diabetes, and how do their treatment approaches differ?"
"""
tag=task_cls(instruct)
```
## π Evaluation
To assess the accuracy of task classification, we manually evaluate a sample set of 100 entries (not in the training set), resulting in a classification precision of 92%.
## π Citation
If you find our work helpful, please cite it!
```
@misc{TAPIR,
title={Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning},
author={Yuanhao Yue and Chengyu Wang and Jun Huang and Peng Wang},
year={2024},
eprint={2405.13448},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.13448},
}
``` |