Zero-Shot Classification
Transformers
PyTorch
Safetensors
English
deberta-v2
text-classification
deberta-v3-base
deberta-v3
deberta
nli
natural-language-inference
multitask
multi-task
pipeline
extreme-multi-task
extreme-mtl
tasksource
zero-shot
rlhf
Eval Results
Inference Endpoints
sileod commited on
Commit
a3c9b94
1 Parent(s): 1a78494

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -282,7 +282,7 @@ pipeline_tag: zero-shot-classification
282
 
283
  # Model Card for DeBERTa-v3-base-tasksource-nli
284
 
285
- This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 560 tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
286
  This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
287
  - Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
288
  - Natural language inference, and many other tasks with tasksource-adapters, see [TA]
@@ -323,7 +323,7 @@ Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olU
323
 
324
 
325
  This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
326
- The number of examples per task was capped to 64k. The model was trained for 100k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 7 days on RTX6000 24GB gpu.
327
 
328
  # Citation
329
 
 
282
 
283
  # Model Card for DeBERTa-v3-base-tasksource-nli
284
 
285
+ This is [DeBERTa-v3-base](https://hf.co/microsoft/deberta-v3-base) fine-tuned with multi-task learning on 600 tasks of the [tasksource collection](https://github.com/sileod/tasksource/).
286
  This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for:
287
  - Zero-shot entailment-based classification pipeline (similar to bart-mnli), see [ZS].
288
  - Natural language inference, and many other tasks with tasksource-adapters, see [TA]
 
323
 
324
 
325
  This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
326
+ The number of examples per task was capped to 64k. The model was trained for 120k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 10 days on RTX6000 24GB gpu.
327
 
328
  # Citation
329