Zero-Shot Classification
Transformers
PyTorch
Safetensors
English
deberta-v2
text-classification
deberta-v3-base
deberta-v3
deberta
nli
natural-language-inference
multitask
multi-task
pipeline
extreme-multi-task
extreme-mtl
tasksource
zero-shot
rlhf
Eval Results
Inference Endpoints
sileod commited on
Commit
7e82d79
1 Parent(s): 644cdf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -7
README.md CHANGED
@@ -269,11 +269,10 @@ pipeline_tag: zero-shot-classification
269
 
270
  # Model Card for DeBERTa-v3-base-tasksource-nli
271
 
272
- This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 560 tasks of the [tasksource collection](https://github.com/sileod/tasksource/)
273
  This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
274
- - Natural language inference
275
  - Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
276
- - Many other tasks with tasksource-adapters, see [TA]
277
  - Further fine-tune for new task (classification, token classification or multiple-choice).
278
 
279
  # [ZS] Zero-shot classification pipeline
@@ -307,16 +306,15 @@ https://ibm.github.io/model-recycling/
307
  https://github.com/sileod/tasksource/ \
308
  https://github.com/sileod/tasknet/ \
309
  Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
310
- Training took 7 days on RTX6000 24GB gpu.
311
 
312
- This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
313
- The number of examples per task was capped to 64k. The model was trained for 45k steps with a batch size of 384, and a peak learning rate of 2e-5.
314
 
 
 
315
 
316
  # Citation
317
 
318
  More details on this [article:](https://arxiv.org/abs/2301.05948)
319
- ```bib
320
  @article{sileo2023tasksource,
321
  title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
322
  author={Sileo, Damien},
 
269
 
270
  # Model Card for DeBERTa-v3-base-tasksource-nli
271
 
272
+ This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 560 tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
273
  This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
274
+ - Natural language inference, and many other tasks with tasksource-adapters, see [TA]
275
  - Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
 
276
  - Further fine-tune for new task (classification, token classification or multiple-choice).
277
 
278
  # [ZS] Zero-shot classification pipeline
 
306
  https://github.com/sileod/tasksource/ \
307
  https://github.com/sileod/tasknet/ \
308
  Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
 
309
 
 
 
310
 
311
+ This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
312
+ The number of examples per task was capped to 64k. The model was trained for 100k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 7 days on RTX6000 24GB gpu.
313
 
314
  # Citation
315
 
316
  More details on this [article:](https://arxiv.org/abs/2301.05948)
317
+ ```
318
  @article{sileo2023tasksource,
319
  title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
320
  author={Sileo, Damien},