Zero-Shot Classification
Transformers
PyTorch
Safetensors
English
deberta-v2
text-classification
deberta-v3-base
deberta-v3
deberta
nli
natural-language-inference
multitask
multi-task
pipeline
extreme-multi-task
extreme-mtl
tasksource
zero-shot
rlhf
Eval Results
Inference Endpoints
sileod commited on
Commit
0a05970
1 Parent(s): 27f54b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -238,13 +238,14 @@ You can also load other tasks (see next paragraph) or further fine-tune the enco
238
  import tasknet as tn
239
  pipe = tn.load_pipeline('sileod/deberta-v3-base-tasksource-nli','glue/sst2') # works for 500+ tasksource tasks
240
  pipe(['That movie was great !', 'Awful movie.'])
241
-
242
- # [{'label': 'positive', 'score': 0.9956129789352417}, {'label': 'negative', 'score': 0.9967049956321716}]
243
  ```
244
 
 
 
245
 
246
  ## Evaluation
247
- This model ranked 1st among all models with the microsoft/deberta-v3-base architecture according to IBM model recycling evaluation.
248
  Results:
249
  [Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=1.41&mnli_lp=nan&20_newsgroup=0.63&ag_news=0.46&amazon_reviews_multi=-0.40&anli=0.94&boolq=2.55&cb=10.71&cola=0.49&copa=10.60&dbpedia=0.10&esnli=-0.25&financial_phrasebank=1.31&imdb=-0.17&isear=0.63&mnli=0.42&mrpc=-0.23&multirc=1.73&poem_sentiment=0.77&qnli=0.12&qqp=-0.05&rotten_tomatoes=0.67&rte=2.13&sst2=0.01&sst_5bins=-0.02&stsb=1.39&trec_coarse=0.24&trec_fine=0.18&tweet_ev_emoji=0.62&tweet_ev_emotion=0.43&tweet_ev_hate=1.84&tweet_ev_irony=1.43&tweet_ev_offensive=0.17&tweet_ev_sentiment=0.08&wic=-1.78&wnli=3.03&wsc=9.95&yahoo_answers=0.17&model_name=sileod%2Fdeberta-v3-base_tasksource-420&base_name=microsoft%2Fdeberta-v3-base) using sileod/deberta-v3-base_tasksource-420 as a base model yields average score of 80.45 in comparison to 79.04 by microsoft/deberta-v3-base.
250
 
@@ -254,20 +255,18 @@ Results:
254
  |---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|--------:|--------:|----------------:|
255
  | 87.042 | 90.9 | 66.46 | 59.7188 | 85.5352 | 85.7143 | 87.0566 | 69 | 79.5333 | 91.6735 | 85.8 | 94.324 | 72.4902 | 90.2055 | 88.9706 | 63.9851 | 87.5 | 93.6299 | 91.7363 | 91.0882 | 84.4765 | 95.0688 | 56.9683 | 91.6654 | 98 | 91.2 | 46.814 | 84.3772 | 58.0471 | 81.25 | 85.2326 | 71.8821 | 69.4357 | 73.2394 | 74.0385 | 72.2 |
256
 
257
-
258
  For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
259
 
260
  ### Software and training details
261
  https://github.com/sileod/tasksource/ \
262
  https://github.com/sileod/tasknet/ \
263
- Training took 7 days on RTX6000 24GB gpu.
264
  Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
 
265
 
266
  This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with one shared encoder.
267
  Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
268
  The number of examples per task was capped to 64k. The model was trained for 45k steps with a batch size of 384, and a peak learning rate of 2e-5.
269
 
270
- The list of tasks is available in model config.
271
 
272
  # Citation
273
 
 
238
  import tasknet as tn
239
  pipe = tn.load_pipeline('sileod/deberta-v3-base-tasksource-nli','glue/sst2') # works for 500+ tasksource tasks
240
  pipe(['That movie was great !', 'Awful movie.'])
241
+ # [{'label': 'positive', 'score': 0.9956}, {'label': 'negative', 'score': 0.9967}]
 
242
  ```
243
 
244
+ The list of tasks is available in model config.json.
245
+
246
 
247
  ## Evaluation
248
+ This model ranked 1st among all models with the microsoft/deberta-v3-base architecture according to the IBM model recycling evaluation.
249
  Results:
250
  [Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=1.41&mnli_lp=nan&20_newsgroup=0.63&ag_news=0.46&amazon_reviews_multi=-0.40&anli=0.94&boolq=2.55&cb=10.71&cola=0.49&copa=10.60&dbpedia=0.10&esnli=-0.25&financial_phrasebank=1.31&imdb=-0.17&isear=0.63&mnli=0.42&mrpc=-0.23&multirc=1.73&poem_sentiment=0.77&qnli=0.12&qqp=-0.05&rotten_tomatoes=0.67&rte=2.13&sst2=0.01&sst_5bins=-0.02&stsb=1.39&trec_coarse=0.24&trec_fine=0.18&tweet_ev_emoji=0.62&tweet_ev_emotion=0.43&tweet_ev_hate=1.84&tweet_ev_irony=1.43&tweet_ev_offensive=0.17&tweet_ev_sentiment=0.08&wic=-1.78&wnli=3.03&wsc=9.95&yahoo_answers=0.17&model_name=sileod%2Fdeberta-v3-base_tasksource-420&base_name=microsoft%2Fdeberta-v3-base) using sileod/deberta-v3-base_tasksource-420 as a base model yields average score of 80.45 in comparison to 79.04 by microsoft/deberta-v3-base.
251
 
 
255
  |---------------:|----------:|-----------------------:|--------:|--------:|--------:|--------:|-------:|----------:|--------:|-----------------------:|-------:|--------:|--------:|--------:|----------:|-----------------:|--------:|--------:|------------------:|--------:|--------:|------------:|--------:|--------------:|------------:|-----------------:|-------------------:|----------------:|-----------------:|---------------------:|---------------------:|--------:|--------:|--------:|----------------:|
256
  | 87.042 | 90.9 | 66.46 | 59.7188 | 85.5352 | 85.7143 | 87.0566 | 69 | 79.5333 | 91.6735 | 85.8 | 94.324 | 72.4902 | 90.2055 | 88.9706 | 63.9851 | 87.5 | 93.6299 | 91.7363 | 91.0882 | 84.4765 | 95.0688 | 56.9683 | 91.6654 | 98 | 91.2 | 46.814 | 84.3772 | 58.0471 | 81.25 | 85.2326 | 71.8821 | 69.4357 | 73.2394 | 74.0385 | 72.2 |
257
 
 
258
  For more information, see: [Model Recycling](https://ibm.github.io/model-recycling/)
259
 
260
  ### Software and training details
261
  https://github.com/sileod/tasksource/ \
262
  https://github.com/sileod/tasknet/ \
 
263
  Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
264
+ Training took 7 days on RTX6000 24GB gpu.
265
 
266
  This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with one shared encoder.
267
  Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
268
  The number of examples per task was capped to 64k. The model was trained for 45k steps with a batch size of 384, and a peak learning rate of 2e-5.
269
 
 
270
 
271
  # Citation
272