sileod
/

deberta-v3-base-tasksource-nli

@@ -269,11 +269,10 @@ pipeline_tag: zero-shot-classification
 # Model Card for DeBERTa-v3-base-tasksource-nli
-This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 560 tasks of the [tasksource collection](https://github.com/sileod/tasksource/)
 This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
-- Natural language inference
 - Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
-- Many other tasks with tasksource-adapters, see [TA]
 - Further fine-tune for new task (classification, token classification or multiple-choice).
 # [ZS] Zero-shot classification pipeline
@@ -307,16 +306,15 @@ https://ibm.github.io/model-recycling/
 https://github.com/sileod/tasksource/ \
 https://github.com/sileod/tasknet/ \
 Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
-Training took 7 days on RTX6000 24GB gpu.
-This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
-The number of examples per task was capped to 64k. The model was trained for 45k steps with a batch size of 384, and a peak learning rate of 2e-5.
 # Citation
 More details on this [article:](https://arxiv.org/abs/2301.05948)
-```bib
 @article{sileo2023tasksource,
   title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
   author={Sileo, Damien},

 # Model Card for DeBERTa-v3-base-tasksource-nli
+This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 560 tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
 This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
+- Natural language inference, and many other tasks with tasksource-adapters, see [TA]
 - Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
 - Further fine-tune for new task (classification, token classification or multiple-choice).
 # [ZS] Zero-shot classification pipeline
 https://github.com/sileod/tasksource/ \
 https://github.com/sileod/tasknet/ \
 Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
+This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
+The number of examples per task was capped to 64k. The model was trained for 100k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 7 days on RTX6000 24GB gpu.
 # Citation
 More details on this [article:](https://arxiv.org/abs/2301.05948)
+```
 @article{sileo2023tasksource,
   title={tasksource: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation},
   author={Sileo, Damien},