sileod
/

deberta-v3-base-tasksource-nli

Zero-Shot Classification

text-classification

deberta-v3-base

natural-language-inference

extreme-multi-task

Inference Endpoints

Model card Files Files and versions Community

sileod commited on Jul 12, 2023

Commit

a3c9b94

•

1 Parent(s): 1a78494

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -282,7 +282,7 @@ pipeline_tag: zero-shot-classification
 # Model Card for DeBERTa-v3-base-tasksource-nli
-This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 560 tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
 This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
 - Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
 - Natural language inference, and many other tasks with tasksource-adapters, see [TA]
@@ -323,7 +323,7 @@ Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olU
 This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
-The number of examples per task was capped to 64k. The model was trained for 100k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 7 days on RTX6000 24GB gpu.
 # Citation

 # Model Card for DeBERTa-v3-base-tasksource-nli
+This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 600 tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
 This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
 - Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
 - Natural language inference, and many other tasks with tasksource-adapters, see [TA]
 This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
+The number of examples per task was capped to 64k. The model was trained for 120k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 10 days on RTX6000 24GB gpu.
 # Citation