Update README.md
Browse files
README.md
CHANGED
@@ -290,17 +290,41 @@ tags:
|
|
290 |
|
291 |
# Model Card for DeBERTa-v3-small-tasksource-nli
|
292 |
|
293 |
-
|
294 |
-
|
|
|
|
|
|
|
|
|
|
|
295 |
- Zero-shot entailment-based classification for arbitrary labels [ZS].
|
296 |
- Natural language inference [NLI]
|
297 |
- Hundreds of previous tasks with tasksource-adapters [TA].
|
298 |
- Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT].
|
299 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
300 |
# [ZS] Zero-shot classification pipeline
|
301 |
```python
|
302 |
from transformers import pipeline
|
303 |
-
classifier = pipeline("zero-shot-classification",model="
|
304 |
|
305 |
text = "one day I will see the world"
|
306 |
candidate_labels = ['travel', 'cooking', 'dancing']
|
@@ -312,42 +336,26 @@ NLI training data of this model includes [label-nli](https://huggingface.co/data
|
|
312 |
|
313 |
```python
|
314 |
from transformers import pipeline
|
315 |
-
pipe = pipeline("text-classification",model="
|
316 |
pipe([dict(text='there is a cat',
|
317 |
text_pair='there is a black cat')]) #list of (premise,hypothesis)
|
318 |
# [{'label': 'neutral', 'score': 0.9952911138534546}]
|
319 |
```
|
320 |
|
321 |
-
# [TA] Tasksource-adapters: 1 line access to hundreds of tasks
|
322 |
-
|
323 |
-
```python
|
324 |
-
# !pip install tasknet
|
325 |
-
import tasknet as tn
|
326 |
-
pipe = tn.load_pipeline('sileod/deberta-v3-small-tasksource-nli','glue/sst2') # works for 500+ tasksource tasks
|
327 |
-
pipe(['That movie was great !', 'Awful movie.'])
|
328 |
-
# [{'label': 'positive', 'score': 0.9956}, {'label': 'negative', 'score': 0.9967}]
|
329 |
-
```
|
330 |
-
The list of tasks is available in model config.json.
|
331 |
-
This is more efficient than ZS since it requires only one forward pass per example, but it is less flexible.
|
332 |
-
|
333 |
-
|
334 |
# [FT] Tasknet: 3 lines fine-tuning
|
335 |
|
336 |
```python
|
337 |
# !pip install tasknet
|
338 |
import tasknet as tn
|
339 |
-
hparams=dict(model_name='
|
340 |
model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
|
341 |
trainer.train()
|
342 |
```
|
343 |
|
344 |
-
## Evaluation
|
345 |
-
This the base equivalent of this model was ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation.
|
346 |
-
https://ibm.github.io/model-recycling/
|
347 |
|
348 |
### Software and training details
|
349 |
|
350 |
-
The model was trained on 600 tasks for
|
351 |
This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
352 |
|
353 |
|
@@ -359,12 +367,16 @@ Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olU
|
|
359 |
|
360 |
More details on this [article:](https://arxiv.org/abs/2301.05948)
|
361 |
```
|
362 |
-
@
|
363 |
-
|
364 |
-
|
365 |
-
|
366 |
-
|
367 |
-
|
|
|
|
|
|
|
|
|
368 |
}
|
369 |
```
|
370 |
|
|
|
290 |
|
291 |
# Model Card for DeBERTa-v3-small-tasksource-nli
|
292 |
|
293 |
+
|
294 |
+
[DeBERTa-v3-small](https://hf.co/microsoft/deberta-v3-small) with context length of 1680 fine-tuned on tasksource for 250k steps. I oversampled long NLI tasks (ConTRoL, doc-nli).
|
295 |
+
Training data include helpsteer v1/v2, logical reasoning tasks (FOLIO, FOL-nli, LogicNLI...), OASST, hh/rlhf, linguistics oriented NLI tasks, tasksource-dpo, fact verification tasks.
|
296 |
+
|
297 |
+
This model is suitable for long context NLI or as a backbone for reward models or classifiers fine-tuning.
|
298 |
+
|
299 |
+
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
|
300 |
- Zero-shot entailment-based classification for arbitrary labels [ZS].
|
301 |
- Natural language inference [NLI]
|
302 |
- Hundreds of previous tasks with tasksource-adapters [TA].
|
303 |
- Further fine-tuning on a new task or tasksource task (classification, token classification or multiple-choice) [FT].
|
304 |
|
305 |
+
|
306 |
+
| test_name | accuracy |
|
307 |
+
|:----------------------------|----------------:|
|
308 |
+
| anli/a1 | 57.2 |
|
309 |
+
| anli/a2 | 46.1 |
|
310 |
+
| anli/a3 | 47.2 |
|
311 |
+
| nli_fever | 71.7 |
|
312 |
+
| FOLIO | 47.1 |
|
313 |
+
| ConTRoL-nli | 52.2 |
|
314 |
+
| cladder | 52.8 |
|
315 |
+
| zero-shot-label-nli | 70.0 |
|
316 |
+
| chatbot_arena_conversations | 67.8 |
|
317 |
+
| oasst2_pairwise_rlhf_reward | 75.6 |
|
318 |
+
| doc-nli | 75.0 |
|
319 |
+
|
320 |
+
|
321 |
+
Zero-shot GPT-4 scores 61% on FOLIO (logical reasoning), 62% on cladder (probabilistic reasoning) and 56.4% on ConTRoL (long context NLI).
|
322 |
+
|
323 |
+
|
324 |
# [ZS] Zero-shot classification pipeline
|
325 |
```python
|
326 |
from transformers import pipeline
|
327 |
+
classifier = pipeline("zero-shot-classification",model="tasksource/deberta-small-long-nli")
|
328 |
|
329 |
text = "one day I will see the world"
|
330 |
candidate_labels = ['travel', 'cooking', 'dancing']
|
|
|
336 |
|
337 |
```python
|
338 |
from transformers import pipeline
|
339 |
+
pipe = pipeline("text-classification",model="tasksource/deberta-small-long-nli")
|
340 |
pipe([dict(text='there is a cat',
|
341 |
text_pair='there is a black cat')]) #list of (premise,hypothesis)
|
342 |
# [{'label': 'neutral', 'score': 0.9952911138534546}]
|
343 |
```
|
344 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
345 |
# [FT] Tasknet: 3 lines fine-tuning
|
346 |
|
347 |
```python
|
348 |
# !pip install tasknet
|
349 |
import tasknet as tn
|
350 |
+
hparams=dict(model_name='tasksource/deberta-small-long-nli', learning_rate=2e-5)
|
351 |
model, trainer = tn.Model_Trainer([tn.AutoTask("glue/rte")], hparams)
|
352 |
trainer.train()
|
353 |
```
|
354 |
|
|
|
|
|
|
|
355 |
|
356 |
### Software and training details
|
357 |
|
358 |
+
The model was trained on 600 tasks for 250k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 14 days on Nvidia A30 24GB gpu.
|
359 |
This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
360 |
|
361 |
|
|
|
367 |
|
368 |
More details on this [article:](https://arxiv.org/abs/2301.05948)
|
369 |
```
|
370 |
+
@inproceedings{sileo-2024-tasksource,
|
371 |
+
title = "tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework",
|
372 |
+
author = "Sileo, Damien",
|
373 |
+
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
|
374 |
+
month = may,
|
375 |
+
year = "2024",
|
376 |
+
address = "Torino, Italia",
|
377 |
+
publisher = "ELRA and ICCL",
|
378 |
+
url = "https://aclanthology.org/2024.lrec-main.1361",
|
379 |
+
pages = "15655--15684",
|
380 |
}
|
381 |
```
|
382 |
|