Zero-Shot Classification
Transformers
PyTorch
Safetensors
English
deberta-v2
text-classification
deberta-v3-base
deberta-v3
deberta
nli
natural-language-inference
multitask
multi-task
pipeline
extreme-multi-task
extreme-mtl
tasksource
zero-shot
rlhf
Eval Results
Inference Endpoints
sileod commited on
Commit
caeff28
1 Parent(s): 9905ec5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -227,10 +227,9 @@ library_name: transformers
227
 
228
  # Model Card for DeBERTa-v3-base-tasksource-nli
229
 
230
- DeBERTa-v3-base fine-tuned with multi-task learning on 520 tasks of the [tasksource collection](https://github.com/sileod/tasksource/)
231
  This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for zero-shot NLI pipeline (similar to bart-mnli but better).
232
- You can further fine-tune this model to use it for any classification or multiple-choice task.
233
- The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
234
 
235
  This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with one shared encoder.
236
  Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
@@ -251,13 +250,9 @@ pipe(['That movie was great !', 'Awful movie.'])
251
  # [{'label': 'positive', 'score': 0.9956129789352417}, {'label': 'negative', 'score': 0.9967049956321716}]
252
  ```
253
 
254
- ### Software
255
- https://github.com/sileod/tasksource/ \
256
- https://github.com/sileod/tasknet/ \
257
- Training took 7 days on RTX6000 24GB gpu.
258
 
259
- ## Model Recycling
260
- This model ranked 1st among all models with the microsoft/deberta-v3-base architecture.
261
  Results:
262
  [Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=1.41&mnli_lp=nan&20_newsgroup=0.63&ag_news=0.46&amazon_reviews_multi=-0.40&anli=0.94&boolq=2.55&cb=10.71&cola=0.49&copa=10.60&dbpedia=0.10&esnli=-0.25&financial_phrasebank=1.31&imdb=-0.17&isear=0.63&mnli=0.42&mrpc=-0.23&multirc=1.73&poem_sentiment=0.77&qnli=0.12&qqp=-0.05&rotten_tomatoes=0.67&rte=2.13&sst2=0.01&sst_5bins=-0.02&stsb=1.39&trec_coarse=0.24&trec_fine=0.18&tweet_ev_emoji=0.62&tweet_ev_emotion=0.43&tweet_ev_hate=1.84&tweet_ev_irony=1.43&tweet_ev_offensive=0.17&tweet_ev_sentiment=0.08&wic=-1.78&wnli=3.03&wsc=9.95&yahoo_answers=0.17&model_name=sileod%2Fdeberta-v3-base_tasksource-420&base_name=microsoft%2Fdeberta-v3-base) using sileod/deberta-v3-base_tasksource-420 as a base model yields average score of 80.45 in comparison to 79.04 by microsoft/deberta-v3-base.
263
 
@@ -270,6 +265,11 @@ Results:
270
 
271
  For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
272
 
 
 
 
 
 
273
 
274
  # Citation
275
 
 
227
 
228
  # Model Card for DeBERTa-v3-base-tasksource-nli
229
 
230
+ DeBERTa-v3-base fine-tuned with multi-task learning on 560 tasks of the [tasksource collection](https://github.com/sileod/tasksource/)
231
  This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI), and can be used for zero-shot NLI pipeline (similar to bart-mnli but better).
232
+ You can also further fine-tune this model to use it for any classification or multiple-choice task.
 
233
 
234
  This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with one shared encoder.
235
  Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
 
250
  # [{'label': 'positive', 'score': 0.9956129789352417}, {'label': 'negative', 'score': 0.9967049956321716}]
251
  ```
252
 
 
 
 
 
253
 
254
+ ## Evaluation
255
+ This model ranked 1st among all models with the microsoft/deberta-v3-base architecture according to IBM model recycling evaluation.
256
  Results:
257
  [Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=1.41&mnli_lp=nan&20_newsgroup=0.63&ag_news=0.46&amazon_reviews_multi=-0.40&anli=0.94&boolq=2.55&cb=10.71&cola=0.49&copa=10.60&dbpedia=0.10&esnli=-0.25&financial_phrasebank=1.31&imdb=-0.17&isear=0.63&mnli=0.42&mrpc=-0.23&multirc=1.73&poem_sentiment=0.77&qnli=0.12&qqp=-0.05&rotten_tomatoes=0.67&rte=2.13&sst2=0.01&sst_5bins=-0.02&stsb=1.39&trec_coarse=0.24&trec_fine=0.18&tweet_ev_emoji=0.62&tweet_ev_emotion=0.43&tweet_ev_hate=1.84&tweet_ev_irony=1.43&tweet_ev_offensive=0.17&tweet_ev_sentiment=0.08&wic=-1.78&wnli=3.03&wsc=9.95&yahoo_answers=0.17&model_name=sileod%2Fdeberta-v3-base_tasksource-420&base_name=microsoft%2Fdeberta-v3-base) using sileod/deberta-v3-base_tasksource-420 as a base model yields average score of 80.45 in comparison to 79.04 by microsoft/deberta-v3-base.
258
 
 
265
 
266
  For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
267
 
268
+ ### Software
269
+ https://github.com/sileod/tasksource/ \
270
+ https://github.com/sileod/tasknet/ \
271
+ Training took 7 days on RTX6000 24GB gpu.
272
+
273
 
274
  # Citation
275