wissamantoun commited on
Commit
7cc35b4
1 Parent(s): 65a1089

added model files

Browse files
Files changed (33) hide show
  1. .gitattributes +1 -0
  2. README.md +101 -0
  3. added_tokens.json +1 -0
  4. config.json +38 -0
  5. runs/.gitkeep +0 -0
  6. runs/eval/events.out.tfevents.1658662619.nefgpu54.32602.453.v2 +3 -0
  7. runs/eval/events.out.tfevents.1658780993.nefgpu54.28912.453.v2 +3 -0
  8. runs/eval/events.out.tfevents.1658782266.nefgpu54.32401.453.v2 +3 -0
  9. runs/eval/events.out.tfevents.1658814681.nefgpu54.80450.453.v2 +3 -0
  10. runs/eval/events.out.tfevents.1658822505.nefgpu54.33886.453.v2 +3 -0
  11. runs/eval/events.out.tfevents.1658978757.nefgpu54.33349.455.v2 +3 -0
  12. runs/eval/events.out.tfevents.1659006537.nefgpu54.32289.455.v2 +3 -0
  13. runs/eval/events.out.tfevents.1659006973.nefgpu54.34472.455.v2 +3 -0
  14. runs/eval/events.out.tfevents.1659133070.nefgpu54.1618.455.v2 +3 -0
  15. runs/eval/events.out.tfevents.1659427595.nefgpu54.33725.455.v2 +3 -0
  16. runs/eval/events.out.tfevents.1659993621.nefgpu54.15278.455.v2 +3 -0
  17. runs/train/p1/events.out.tfevents.1658662619.nefgpu54.32602.461.v2 +3 -0
  18. runs/train/p1/events.out.tfevents.1658780993.nefgpu54.28912.461.v2 +3 -0
  19. runs/train/p1/events.out.tfevents.1658782266.nefgpu54.32401.461.v2 +3 -0
  20. runs/train/p1/events.out.tfevents.1658814681.nefgpu54.80450.461.v2 +3 -0
  21. runs/train/p1/events.out.tfevents.1658822505.nefgpu54.33886.461.v2 +3 -0
  22. runs/train/p1/events.out.tfevents.1658978757.nefgpu54.33349.463.v2 +3 -0
  23. runs/train/p1/events.out.tfevents.1659006537.nefgpu54.32289.463.v2 +3 -0
  24. runs/train/p1/events.out.tfevents.1659006973.nefgpu54.34472.463.v2 +3 -0
  25. runs/train/p2/events.out.tfevents.1659133070.nefgpu54.1618.463.v2 +3 -0
  26. runs/train/p2/events.out.tfevents.1659427595.nefgpu54.33725.463.v2 +3 -0
  27. runs/train/p2/events.out.tfevents.1659993621.nefgpu54.15278.463.v2 +3 -0
  28. runs/training_summary.txt +24 -0
  29. special_tokens_map.json +1 -0
  30. spm.model +3 -0
  31. tf_model.h5 +3 -0
  32. tokenizer.json +0 -0
  33. tokenizer_config.json +16 -0
.gitattributes CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ *.ckpt-* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,104 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language: fr
4
+ datasets:
5
+ - ccnet
6
+ tags:
7
+ - deberta
8
+ - deberta-v3
9
  ---
10
+
11
+ # CamemBERTa: A French language model based on DeBERTa V3
12
+
13
+ CamemBERTa, a French language model based on DeBERTa V3, which is a DeBerta V2 with ELECTRA style pretraining using the Replaced Token Detection (RTD) objective.
14
+ RTD uses a generator model, trained using the MLM objective, to replace masked tokens with plausible candidates, and a discriminator model trained to detect which tokens were replaced by the generator.
15
+ Usually the generator and discriminator share the same embedding matrix, but the authors of DeBERTa V3 propose a new technique to disentagle the gradients of the shared embedding between the generator and discriminator called gradient-disentangled embedding sharing (GDES)
16
+
17
+ *This the first publicly available implementation of DeBERTa V3, and the first publicly DeBERTaV3 model outside of the original Microsoft release.*
18
+
19
+ Preprint Paper: https://inria.hal.science/hal-03963729/
20
+
21
+ Pre-training Code: https://gitlab.inria.fr/almanach/CamemBERTa
22
+
23
+ ## How to use CamemBERTa
24
+ Our pretrained weights are available on the HuggingFace model hub, you can load them using the following code:
25
+
26
+ ```python
27
+ from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM
28
+
29
+ CamemBERTa = AutoModel.from_pretrained("almanach/camemberta-base")
30
+ tokenizer = AutoTokenizer.from_pretrained("almanach/camemberta-base")
31
+
32
+ CamemBERTa_gen = AutoModelForMaskedLM.from_pretrained("almanach/camemberta-base-generator")
33
+ tokenizer_gen = AutoTokenizer.from_pretrained("almanach/camemberta-base-generator")
34
+ ```
35
+
36
+ We also include the TF2 weights including the weights for the model's RTD head for the discriminator, and the MLM head for the generator.
37
+ CamemBERTa is compatible with most finetuning scripts from the transformers library.
38
+
39
+ ## Pretraining Setup
40
+
41
+ The model was trained on the French subset of the CCNet corpus (the same subset used in CamemBERT and PaGNOL) and is available on the HuggingFace model hub: CamemBERTa and CamemBERTa Generator.
42
+ To speed up the pre-training experiments, the pre-training was split into two phases;
43
+ in phase 1, the model is trained with a maximum sequence length of 128 tokens for 10,000 steps with 2,000 warm-up steps and a very large batch size of 67,584.
44
+ In phase 2, maximum sequence length is increased to the full model capacity of 512 tokens for 3,300 steps with 200 warm-up steps and a batch size of 27,648.
45
+ The model would have seen 133B tokens compared to 419B tokens for CamemBERT-CCNet which was trained for 100K steps, this represents roughly 30% of CamemBERT’s full training.
46
+ To have a fair comparison, we trained a RoBERTa model, CamemBERT30%, using the same exact pretraining setup but with the MLM objective.
47
+
48
+ ## Pretraining Loss Curves
49
+ check the tensorboard logs and plots
50
+
51
+ ## Fine-tuning results
52
+
53
+ Datasets: POS tagging and Dependency Parsing (GSD, Rhapsodie, Sequoia, FSMB), NER (FTB), the FLUE benchmark (XNLI, CLS, PAWS-X), and the French Question Answering Dataset (FQuAD)
54
+
55
+ | Model | UPOS | LAS | NER | CLS | PAWS-X | XNLI | F1 (FQuAD) | EM (FQuAD) |
56
+ |-------------------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|
57
+ | CamemBERT (CCNet) | **97.59** | **88.69** | 89.97 | 94.62 | 91.36 | 81.95 | 80.98 | **62.51** |
58
+ | CamemBERT (30%) | 97.53 | 87.98 | **91.04** | 93.28 | 88.94 | 79.89 | 75.14 | 56.19 |
59
+ | CamemBERTa | 97.57 | 88.55 | 90.33 | **94.92** | **91.67** | **82.00** | **81.15** | 62.01 |
60
+
61
+ The following table compares CamemBERTa's performance on XNLI against other models under different training setups, which demonstrates the data efficiency of CamemBERTa.
62
+
63
+
64
+ | Model | XNLI (Acc.) | Training Steps | Tokens seen in pre-training | Dataset Size in Tokens |
65
+ |-------------------|-------------|----------------|-----------------------------|------------------------|
66
+ | mDeBERTa | 84.4 | 500k | 2T | 2.5T |
67
+ | CamemBERTa | 82.0 | 33k | 0.139T | 0.319T |
68
+ | XLM-R | 81.4 | 1.5M | 6T | 2.5T |
69
+ | CamemBERT - CCNet | 81.95 | 100k | 0.419T | 0.319T |
70
+
71
+ *Note: The CamemBERTa training steps was adjusted for a batch size of 8192.*
72
+
73
+ ## License
74
+
75
+ The public model weights are licensed under MIT License.
76
+ This code is licensed under the Apache License 2.0.
77
+
78
+ ## Citation
79
+
80
+ Paper accepted to Findings of ACL 2023.
81
+
82
+ You can use the preprint citation for now
83
+
84
+ ```
85
+ @article{antoun2023camemberta
86
+ TITLE = {{Data-Efficient French Language Modeling with CamemBERTa}},
87
+ AUTHOR = {Antoun, Wissam and Sagot, Beno{\^i}t and Seddah, Djam{\'e}},
88
+ URL = {https://inria.hal.science/hal-03963729},
89
+ NOTE = {working paper or preprint},
90
+ YEAR = {2023},
91
+ MONTH = Jan,
92
+ PDF = {https://inria.hal.science/hal-03963729/file/French_DeBERTa___ACL_2023%20to%20be%20uploaded.pdf},
93
+ HAL_ID = {hal-03963729},
94
+ HAL_VERSION = {v1},
95
+ }
96
+ ```
97
+
98
+ ## Contact
99
+
100
+ Wissam Antoun: `wissam (dot) antoun (at) inria (dot) fr`
101
+
102
+ Benoit Sagot: `benoit (dot) sagot (at) inria (dot) fr`
103
+
104
+ Djame Seddah: `djame (dot) seddah (at) inria (dot) fr`
added_tokens.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"[UNK]": 32001, "[PAD]": 32002}
config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "amp": true,
3
+ "architectures": [
4
+ "DebertaV2ForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "conv_act": "gelu",
8
+ "conv_kernel_size": 3,
9
+ "embedding_size": 768,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 256,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 1024,
15
+ "layer_norm_eps": 1e-07,
16
+ "max_position_embeddings": 512,
17
+ "max_relative_positions": -1,
18
+ "model_name": "camemberta-base",
19
+ "model_type": "deberta-v2",
20
+ "norm_rel_ebd": "layer_norm",
21
+ "num_attention_heads": 4,
22
+ "num_hidden_layers": 12,
23
+ "pad_token_id": 0,
24
+ "pooler_dropout": 0,
25
+ "pooler_hidden_act": "gelu",
26
+ "pooler_hidden_size": 768,
27
+ "pos_att_type": [
28
+ "p2c",
29
+ "c2p"
30
+ ],
31
+ "position_biased_input": false,
32
+ "position_buckets": 256,
33
+ "relative_attention": true,
34
+ "share_att_key": true,
35
+ "transformers_version": "4.18.0.dev0",
36
+ "type_vocab_size": 0,
37
+ "vocab_size": 32008
38
+ }
runs/.gitkeep ADDED
File without changes
runs/eval/events.out.tfevents.1658662619.nefgpu54.32602.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abff33d04db4af03dd3c8fd067478e73a80d56dc326f7372a9cc18e8a776dc96
3
+ size 40
runs/eval/events.out.tfevents.1658780993.nefgpu54.28912.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:077bf001772d30e10dc5a0a725f4b82e27b4871249297630e8e6d80744c67cb7
3
+ size 40
runs/eval/events.out.tfevents.1658782266.nefgpu54.32401.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e43bf47fdcca2a060107d13f5519364caa48fbc21da07fec23d4d7c2202e928
3
+ size 40
runs/eval/events.out.tfevents.1658814681.nefgpu54.80450.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28f1dd62d5aa1b4a4f1aebf1e3b9a195dcbaf3403684d15ae022645d734358cd
3
+ size 40
runs/eval/events.out.tfevents.1658822505.nefgpu54.33886.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e60ab04655c96abd22edca70f5f3a9b07ae7a2eba4be9064968339b2bc9f047
3
+ size 40
runs/eval/events.out.tfevents.1658978757.nefgpu54.33349.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d53cf9d245fb7b4ce9b318701b924004b4315c6c1df5c56443f48ec2859a5f2
3
+ size 40
runs/eval/events.out.tfevents.1659006537.nefgpu54.32289.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a2831d6cc6060e6e01a9173ef21f2cd8388ff6e3a28be7ce52db5b7a97031de
3
+ size 40
runs/eval/events.out.tfevents.1659006973.nefgpu54.34472.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e4126a80bf4195601d8ac0494a2d13bea3200698e79fbe31b58888fad2fd578
3
+ size 40
runs/eval/events.out.tfevents.1659133070.nefgpu54.1618.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:263099010447ac4ef8020ef5df72b0070baaeee41e257191ac5ed9310e73d5fd
3
+ size 40
runs/eval/events.out.tfevents.1659427595.nefgpu54.33725.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6412c40d4909dfccca3144c46e5bc815af8d12955cffd16610259230abe68f39
3
+ size 40
runs/eval/events.out.tfevents.1659993621.nefgpu54.15278.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84a318839fe4bf4bc681eae19c894f0cdc49a532d70a60aadf760fdaf3f0b544
3
+ size 40
runs/train/p1/events.out.tfevents.1658662619.nefgpu54.32602.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7902a3b23f6f324bb68e9fcac5172283e35e8b8366d39a26f790f33d9f164161
3
+ size 237508
runs/train/p1/events.out.tfevents.1658780993.nefgpu54.28912.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e42ed2c1aebb7df82aa3eea4a4e3fcf4b739cf29d01ea36f0546815e6d510d39
3
+ size 821
runs/train/p1/events.out.tfevents.1658782266.nefgpu54.32401.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e43bf47fdcca2a060107d13f5519364caa48fbc21da07fec23d4d7c2202e928
3
+ size 40
runs/train/p1/events.out.tfevents.1658814681.nefgpu54.80450.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28f1dd62d5aa1b4a4f1aebf1e3b9a195dcbaf3403684d15ae022645d734358cd
3
+ size 40
runs/train/p1/events.out.tfevents.1658822505.nefgpu54.33886.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:028a860984ba3a4c84fdceb9408e969930c186b9b1e609c8d944fd4549848f35
3
+ size 237640
runs/train/p1/events.out.tfevents.1658978757.nefgpu54.33349.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d53cf9d245fb7b4ce9b318701b924004b4315c6c1df5c56443f48ec2859a5f2
3
+ size 40
runs/train/p1/events.out.tfevents.1659006537.nefgpu54.32289.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a2831d6cc6060e6e01a9173ef21f2cd8388ff6e3a28be7ce52db5b7a97031de
3
+ size 40
runs/train/p1/events.out.tfevents.1659006973.nefgpu54.34472.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afc73cfd2913a9332bda9226fe81e408c3a15ec2f074b1397507d4fc3ef16924
3
+ size 269320
runs/train/p2/events.out.tfevents.1659133070.nefgpu54.1618.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e479289d46f17b0cd2af8c4d9210b70e5cf5c0b4b8c8f9fa9f288aa7fcec6e8
3
+ size 197908
runs/train/p2/events.out.tfevents.1659427595.nefgpu54.33725.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b704c973c7e26b0b484341b42cf4af42bd34d4de43950a8ef1d11568876f95d3
3
+ size 14296
runs/train/p2/events.out.tfevents.1659993621.nefgpu54.15278.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e87c83148232ebd9dd3a75bc41b0868bc93bf307a863c263959ab586c53623a2
3
+ size 42808
runs/training_summary.txt ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_training_steps": 3400,
3
+ "train_loss": 7.284830570220947,
4
+ "last_train_metrics_train_perf": 266.8984680175781,
5
+ "last_train_metrics_total_loss": 7.284830570220947,
6
+ "last_train_metrics_masked_lm_accuracy": 0.6975675821304321,
7
+ "last_train_metrics_masked_lm_loss": 1.4581712484359741,
8
+ "last_train_metrics_sampled_masked_lm_accuracy": 0.6167887449264526,
9
+ "last_train_metrics_disc_loss": 0.1190570518374443,
10
+ "last_train_metrics_disc_auc": 0.0,
11
+ "last_train_metrics_disc_accuracy": 0.960852861404419,
12
+ "last_train_metrics_disc_precision": 0.7765898704528809,
13
+ "last_train_metrics_disc_recall": 0.38164031505584717,
14
+ "eval_metrics_train_perf": 0.0,
15
+ "eval_metrics_total_loss": 0.0,
16
+ "eval_metrics_masked_lm_accuracy": 0.0,
17
+ "eval_metrics_masked_lm_loss": 0.0,
18
+ "eval_metrics_sampled_masked_lm_accuracy": 0.0,
19
+ "eval_metrics_disc_loss": 0.0,
20
+ "eval_metrics_disc_auc": 0.0,
21
+ "eval_metrics_disc_accuracy": 0.0,
22
+ "eval_metrics_disc_precision": 0.0,
23
+ "eval_metrics_disc_recall": 0.0
24
+ }
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "[CLS]", "eos_token": "[SEP]", "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eaf6658b4c5f33b1a6092e07deec6f921e4c6e87bf3068d109a2f1fd44849b50
3
+ size 808787
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d025d27d13d2430bd1857bf8198a5f1dab0a1ee41a95eacf25fbc13c5afcb45a
3
+ size 238688520
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_lower_case": false,
3
+ "bos_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "unk_token": "[UNK]",
6
+ "sep_token": "[SEP]",
7
+ "pad_token": "[PAD]",
8
+ "cls_token": "[CLS]",
9
+ "mask_token": "[MASK]",
10
+ "split_by_punct": false,
11
+ "special_tokens_map_file": null,
12
+ "name_or_path": "vocab/camembert-deberta/",
13
+ "sp_model_kwargs": {},
14
+ "tokenizer_file": null,
15
+ "tokenizer_class": "DebertaV2Tokenizer"
16
+ }