MoritzLaurer HF staff commited on
Commit
d5928fd
1 Parent(s): e0861d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -54
README.md CHANGED
@@ -1,70 +1,46 @@
1
  ---
2
- license: mit
3
  base_model: microsoft/deberta-v3-xsmall
 
 
4
  tags:
5
- - generated_from_trainer
6
- metrics:
7
- - accuracy
8
- model-index:
9
- - name: deberta-v3-xsmall-zeroshot-v1.1-none
10
- results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- # deberta-v3-xsmall-zeroshot-v1.1-none
17
-
18
- This model is a fine-tuned version of [microsoft/deberta-v3-xsmall](https://huggingface.co/microsoft/deberta-v3-xsmall) on an unknown dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 0.2072
21
- - F1 Macro: 0.6369
22
- - F1 Micro: 0.7013
23
- - Accuracy Balanced: 0.6751
24
- - Accuracy: 0.7013
25
- - Precision Macro: 0.6439
26
- - Recall Macro: 0.6751
27
- - Precision Micro: 0.7013
28
- - Recall Micro: 0.7013
29
-
30
- ## Model description
31
-
32
- More information needed
33
-
34
- ## Intended uses & limitations
35
 
36
- More information needed
37
 
38
- ## Training and evaluation data
 
 
 
 
 
 
 
39
 
40
- More information needed
 
41
 
42
- ## Training procedure
 
 
 
43
 
44
- ### Training hyperparameters
45
 
46
- The following hyperparameters were used during training:
47
- - learning_rate: 2e-05
48
- - train_batch_size: 32
49
- - eval_batch_size: 128
50
- - seed: 42
51
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
- - lr_scheduler_type: linear
53
- - lr_scheduler_warmup_ratio: 0.06
54
- - num_epochs: 3
55
 
56
- ### Training results
57
 
58
- | Training Loss | Epoch | Step | Validation Loss | F1 Macro | F1 Micro | Accuracy Balanced | Accuracy | Precision Macro | Recall Macro | Precision Micro | Recall Micro |
59
- |:-------------:|:-----:|:-----:|:---------------:|:--------:|:--------:|:-----------------:|:--------:|:---------------:|:------------:|:---------------:|:------------:|
60
- | 0.2532 | 1.0 | 30790 | 0.4006 | 0.8198 | 0.8384 | 0.8151 | 0.8384 | 0.8257 | 0.8151 | 0.8384 | 0.8384 |
61
- | 0.2113 | 2.0 | 61580 | 0.3907 | 0.8254 | 0.8439 | 0.8198 | 0.8439 | 0.8326 | 0.8198 | 0.8439 | 0.8439 |
62
- | 0.1727 | 3.0 | 92370 | 0.4228 | 0.8306 | 0.8461 | 0.8297 | 0.8461 | 0.8315 | 0.8297 | 0.8461 | 0.8461 |
63
 
64
 
65
- ### Framework versions
66
 
67
- - Transformers 4.33.3
68
- - Pytorch 2.1.2+cu121
69
- - Datasets 2.14.7
70
- - Tokenizers 0.13.3
 
1
  ---
 
2
  base_model: microsoft/deberta-v3-xsmall
3
+ language:
4
+ - en
5
  tags:
6
+ - text-classification
7
+ - zero-shot-classification
8
+ pipeline_tag: zero-shot-classification
9
+ library_name: transformers
10
+ license: mit
 
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ # deberta-v3-xsmall-zeroshot-v1.1-all-33
15
 
16
+ This model was fine-tuned using the same pipeline as described in
17
+ the model card for [MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33)
18
+ and in this [paper](https://arxiv.org/pdf/2312.17543.pdf).
19
+
20
+ The foundation model is [microsoft/deberta-v3-xsmall](https://huggingface.co/microsoft/deberta-v3-xsmall).
21
+ The model only has 22 million backbone parameters and 128 million vocabulary parameters.
22
+ The backbone parameters are the main parameters active during inference, providing a significant speedup over larger models.
23
+ The model is 241 MB small.
24
 
25
+ This model was trained to provide a small and highly efficient zeroshot option,
26
+ especially for edge devices or in-browser use-cases with transformers.js.
27
 
28
+ ## Usage and other details
29
+ For usage instructions and other details refer to
30
+ this model card [MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33)
31
+ and this [paper](https://arxiv.org/pdf/2312.17543.pdf).
32
 
33
+ ## Metrics:
34
 
35
+ I didn't not do zeroshot evaluation for this model to save time and compute.
36
+ The table below shows standard accuracy for all datasets the model was trained on (note that the NLI datasets are binary).
 
 
 
 
 
 
 
37
 
38
+ General takeaway: the model is much more efficient than its larger sisters, but it performs less well.
39
 
40
+ |Datasets|mnli_m|mnli_mm|fevernli|anli_r1|anli_r2|anli_r3|wanli|lingnli|wellformedquery|rottentomatoes|amazonpolarity|imdb|yelpreviews|hatexplain|massive|banking77|emotiondair|emocontext|empathetic|agnews|yahootopics|biasframes_sex|biasframes_offensive|biasframes_intent|financialphrasebank|appreviews|hateoffensive|trueteacher|spam|wikitoxic_toxicaggregated|wikitoxic_obscene|wikitoxic_identityhate|wikitoxic_threat|wikitoxic_insult|manifesto|capsotu|
41
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
42
+ |Accuracy|0.925|0.923|0.886|0.732|0.633|0.661|0.814|0.887|0.722|0.872|0.944|0.925|0.967|0.774|0.734|0.627|0.762|0.745|0.465|0.888|0.702|0.94|0.853|0.863|0.914|0.926|0.921|0.635|0.968|0.897|0.918|0.915|0.935|0.9|0.505|0.701|
43
+ |Inference text/sec (A100, batch=128)|1573.0|1630.0|683.0|1282.0|1352.0|1072.0|2325.0|2008.0|4781.0|2743.0|677.0|228.0|238.0|2357.0|5027.0|4323.0|3247.0|3129.0|941.0|1643.0|335.0|1517.0|1452.0|1498.0|2367.0|974.0|2634.0|353.0|2284.0|260.0|252.0|256.0|254.0|259.0|1941.0|2080.0|
 
44
 
45
 
 
46