ANTOUN Wissam commited on
Commit
f460d39
1 Parent(s): 109da43

added model files

Browse files
Files changed (35) hide show
  1. .gitattributes +1 -0
  2. README.md +94 -0
  3. added_tokens.json +1 -0
  4. ckpt-3400.ckpt-137.data-00000-of-00001 +3 -0
  5. ckpt-3400.ckpt-137.index +3 -0
  6. config.json +40 -0
  7. pytorch_model.bin +3 -0
  8. runs/.gitkeep +0 -0
  9. runs/eval/events.out.tfevents.1658662619.nefgpu54.32602.453.v2 +3 -0
  10. runs/eval/events.out.tfevents.1658780993.nefgpu54.28912.453.v2 +3 -0
  11. runs/eval/events.out.tfevents.1658782266.nefgpu54.32401.453.v2 +3 -0
  12. runs/eval/events.out.tfevents.1658814681.nefgpu54.80450.453.v2 +3 -0
  13. runs/eval/events.out.tfevents.1658822505.nefgpu54.33886.453.v2 +3 -0
  14. runs/eval/events.out.tfevents.1658978757.nefgpu54.33349.455.v2 +3 -0
  15. runs/eval/events.out.tfevents.1659006537.nefgpu54.32289.455.v2 +3 -0
  16. runs/eval/events.out.tfevents.1659006973.nefgpu54.34472.455.v2 +3 -0
  17. runs/eval/events.out.tfevents.1659133070.nefgpu54.1618.455.v2 +3 -0
  18. runs/eval/events.out.tfevents.1659427595.nefgpu54.33725.455.v2 +3 -0
  19. runs/eval/events.out.tfevents.1659993621.nefgpu54.15278.455.v2 +3 -0
  20. runs/train/p1/events.out.tfevents.1658662619.nefgpu54.32602.461.v2 +3 -0
  21. runs/train/p1/events.out.tfevents.1658780993.nefgpu54.28912.461.v2 +3 -0
  22. runs/train/p1/events.out.tfevents.1658782266.nefgpu54.32401.461.v2 +3 -0
  23. runs/train/p1/events.out.tfevents.1658814681.nefgpu54.80450.461.v2 +3 -0
  24. runs/train/p1/events.out.tfevents.1658822505.nefgpu54.33886.461.v2 +3 -0
  25. runs/train/p1/events.out.tfevents.1658978757.nefgpu54.33349.463.v2 +3 -0
  26. runs/train/p1/events.out.tfevents.1659006537.nefgpu54.32289.463.v2 +3 -0
  27. runs/train/p1/events.out.tfevents.1659006973.nefgpu54.34472.463.v2 +3 -0
  28. runs/train/p2/events.out.tfevents.1659133070.nefgpu54.1618.463.v2 +3 -0
  29. runs/train/p2/events.out.tfevents.1659427595.nefgpu54.33725.463.v2 +3 -0
  30. runs/train/p2/events.out.tfevents.1659993621.nefgpu54.15278.463.v2 +3 -0
  31. runs/training_summary.txt +24 -0
  32. special_tokens_map.json +1 -0
  33. spm.model +3 -0
  34. tokenizer.json +0 -0
  35. tokenizer_config.json +17 -0
.gitattributes CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ *.ckpt-* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,97 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # CamemBERTa: A French language model based on DeBERTa V3
6
+
7
+ CamemBERTa, a French language model based on DeBERTa V3, which is a DeBerta V2 with ELECTRA style pretraining using the Replaced Token Detection (RTD) objective.
8
+ RTD uses a generator model, trained using the MLM objective, to replace masked tokens with plausible candidates, and a discriminator model trained to detect which tokens were replaced by the generator.
9
+ Usually the generator and discriminator share the same embedding matrix, but the authors of DeBERTa V3 propose a new technique to disentagle the gradients of the shared embedding between the generator and discriminator called gradient-disentangled embedding sharing (GDES)
10
+
11
+ *This the first publicly available implementation of DeBERTa V3, and the first publicly DeBERTaV3 model outside of the original Microsoft release.*
12
+
13
+ Preprint Paper: https://inria.hal.science/hal-03963729/
14
+ Pre-training Code: https://gitlab.inria.fr/almanach/CamemBERTa
15
+
16
+ ## How to use CamemBERTa
17
+ Our pretrained weights are available on the HuggingFace model hub, you can load them using the following code:
18
+
19
+ ```python
20
+ from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM
21
+
22
+ CamemBERTa = AutoModel.from_pretrained("almanach/camemberta-base")
23
+ tokenizer = AutoTokenizer.from_pretrained("almanach/camemberta-base")
24
+
25
+ CamemBERTa_gen = AutoModelForMaskedLM.from_pretrained("almanach/camemberta-base-generator")
26
+ tokenizer_gen = AutoTokenizer.from_pretrained("almanach/camemberta-base-generator")
27
+ ```
28
+
29
+ We also include the TF2 weights including the weights for the model's RTD head for the discriminator, and the MLM head for the generator.
30
+ CamemBERTa is compatible with most finetuning scripts from the transformers library.
31
+
32
+ ## Pretraining Setup
33
+
34
+ The model was trained on the French subset of the CCNet corpus (the same subset used in CamemBERT and PaGNOL) and is available on the HuggingFace model hub: CamemBERTa and CamemBERTa Generator.
35
+ To speed up the pre-training experiments, the pre-training was split into two phases;
36
+ in phase 1, the model is trained with a maximum sequence length of 128 tokens for 10,000 steps with 2,000 warm-up steps and a very large batch size of 67,584.
37
+ In phase 2, maximum sequence length is increased to the full model capacity of 512 tokens for 3,300 steps with 200 warm-up steps and a batch size of 27,648.
38
+ The model would have seen 133B tokens compared to 419B tokens for CamemBERT-CCNet which was trained for 100K steps, this represents roughly 30% of CamemBERT’s full training.
39
+ To have a fair comparison, we trained a RoBERTa model, CamemBERT30%, using the same exact pretraining setup but with the MLM objective.
40
+
41
+ ## Pretraining Loss Curves
42
+ check the tensorboard logs and plots
43
+
44
+ ## Fine-tuning results
45
+
46
+ Datasets: POS tagging and Dependency Parsing (GSD, Rhapsodie, Sequoia, FSMB), NER (FTB), the FLUE benchmark (XNLI, CLS, PAWS-X), and the French Question Answering Dataset (FQuAD)
47
+
48
+ | Model | UPOS | LAS | NER | CLS | PAWS-X | XNLI | F1 (FQuAD) | EM (FQuAD) |
49
+ |-------------------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|
50
+ | CamemBERT (CCNet) | **97.59** | **88.69** | 89.97 | 94.62 | 91.36 | 81.95 | 80.98 | **62.51** |
51
+ | CamemBERT (30%) | 97.53 | 87.98 | **91.04** | 93.28 | 88.94 | 79.89 | 75.14 | 56.19 |
52
+ | CamemBERTa | 97.57 | 88.55 | 90.33 | **94.92** | **91.67** | **82.00** | **81.15** | 62.01 |
53
+
54
+ The following table compares CamemBERTa's performance on XNLI against other models under different training setups, which demonstrates the data efficiency of CamemBERTa.
55
+
56
+
57
+ | Model | XNLI (Acc.) | Training Steps | Tokens seen in pre-training | Dataset Size in Tokens |
58
+ |-------------------|-------------|----------------|-----------------------------|------------------------|
59
+ | mDeBERTa | 84.4 | 500k | 2T | 2.5T |
60
+ | CamemBERTa | 82.0 | 33k | 0.139T | 0.319T |
61
+ | XLM-R | 81.4 | 1.5M | 6T | 2.5T |
62
+ | CamemBERT - CCNet | 81.95 | 100k | 0.419T | 0.319T |
63
+
64
+ *Note: The CamemBERTa training steps was adjusted for a batch size of 8192.*
65
+
66
+ ## License
67
+
68
+ The public model weights are licensed under MIT License.
69
+ This code is licensed under the Apache License 2.0.
70
+
71
+ ## Citation
72
+
73
+ Paper accepted to Findings of ACL 2023.
74
+
75
+ You can use the preprint citation for now
76
+
77
+ ```
78
+ @article{antoun2023camemberta
79
+ TITLE = {{Data-Efficient French Language Modeling with CamemBERTa}},
80
+ AUTHOR = {Antoun, Wissam and Sagot, Beno{\^i}t and Seddah, Djam{\'e}},
81
+ URL = {https://inria.hal.science/hal-03963729},
82
+ NOTE = {working paper or preprint},
83
+ YEAR = {2023},
84
+ MONTH = Jan,
85
+ PDF = {https://inria.hal.science/hal-03963729/file/French_DeBERTa___ACL_2023%20to%20be%20uploaded.pdf},
86
+ HAL_ID = {hal-03963729},
87
+ HAL_VERSION = {v1},
88
+ }
89
+ ```
90
+
91
+ ## Contact
92
+
93
+ Wissam Antoun: `wissam (dot) antoun (at) inria (dot) fr`
94
+
95
+ Benoit Sagot: `benoit (dot) sagot (at) inria (dot) fr`
96
+
97
+ Djame Seddah: `djame (dot) seddah (at) inria (dot) fr`
added_tokens.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"[UNK]": 32001, "[PAD]": 32002}
ckpt-3400.ckpt-137.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec796fc8fbd5134ae5c0d5a612c60a3335075b94f83fee1ccf03bb78e5ffe2ee
3
+ size 1766899682
ckpt-3400.ckpt-137.index ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99c19dc261521f491d38078f77beba984f630cf4d936609186b4f4f89e437ec9
3
+ size 92494
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "anondeb/debv3-base",
3
+ "amp": true,
4
+ "architectures": [
5
+ "DebertaV2Model"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.1,
8
+ "conv_act": "gelu",
9
+ "conv_kernel_size": 3,
10
+ "embedding_size": 768,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-07,
17
+ "max_position_embeddings": 512,
18
+ "max_relative_positions": -1,
19
+ "model_name": "camemberta-base",
20
+ "model_type": "deberta-v2",
21
+ "norm_rel_ebd": "layer_norm",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "pooler_dropout": 0,
26
+ "pooler_hidden_act": "gelu",
27
+ "pooler_hidden_size": 768,
28
+ "pos_att_type": [
29
+ "p2c",
30
+ "c2p"
31
+ ],
32
+ "position_biased_input": false,
33
+ "position_buckets": 256,
34
+ "relative_attention": true,
35
+ "share_att_key": true,
36
+ "torch_dtype": "float32",
37
+ "transformers_version": "4.20.1",
38
+ "type_vocab_size": 0,
39
+ "vocab_size": 32008
40
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae6d1d5495d283736fbf8808c8177c364638abfef16c20f3d4a65683ef42f23e
3
+ size 447289360
runs/.gitkeep ADDED
File without changes
runs/eval/events.out.tfevents.1658662619.nefgpu54.32602.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abff33d04db4af03dd3c8fd067478e73a80d56dc326f7372a9cc18e8a776dc96
3
+ size 40
runs/eval/events.out.tfevents.1658780993.nefgpu54.28912.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:077bf001772d30e10dc5a0a725f4b82e27b4871249297630e8e6d80744c67cb7
3
+ size 40
runs/eval/events.out.tfevents.1658782266.nefgpu54.32401.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e43bf47fdcca2a060107d13f5519364caa48fbc21da07fec23d4d7c2202e928
3
+ size 40
runs/eval/events.out.tfevents.1658814681.nefgpu54.80450.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28f1dd62d5aa1b4a4f1aebf1e3b9a195dcbaf3403684d15ae022645d734358cd
3
+ size 40
runs/eval/events.out.tfevents.1658822505.nefgpu54.33886.453.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e60ab04655c96abd22edca70f5f3a9b07ae7a2eba4be9064968339b2bc9f047
3
+ size 40
runs/eval/events.out.tfevents.1658978757.nefgpu54.33349.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d53cf9d245fb7b4ce9b318701b924004b4315c6c1df5c56443f48ec2859a5f2
3
+ size 40
runs/eval/events.out.tfevents.1659006537.nefgpu54.32289.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a2831d6cc6060e6e01a9173ef21f2cd8388ff6e3a28be7ce52db5b7a97031de
3
+ size 40
runs/eval/events.out.tfevents.1659006973.nefgpu54.34472.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e4126a80bf4195601d8ac0494a2d13bea3200698e79fbe31b58888fad2fd578
3
+ size 40
runs/eval/events.out.tfevents.1659133070.nefgpu54.1618.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:263099010447ac4ef8020ef5df72b0070baaeee41e257191ac5ed9310e73d5fd
3
+ size 40
runs/eval/events.out.tfevents.1659427595.nefgpu54.33725.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6412c40d4909dfccca3144c46e5bc815af8d12955cffd16610259230abe68f39
3
+ size 40
runs/eval/events.out.tfevents.1659993621.nefgpu54.15278.455.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84a318839fe4bf4bc681eae19c894f0cdc49a532d70a60aadf760fdaf3f0b544
3
+ size 40
runs/train/p1/events.out.tfevents.1658662619.nefgpu54.32602.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7902a3b23f6f324bb68e9fcac5172283e35e8b8366d39a26f790f33d9f164161
3
+ size 237508
runs/train/p1/events.out.tfevents.1658780993.nefgpu54.28912.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e42ed2c1aebb7df82aa3eea4a4e3fcf4b739cf29d01ea36f0546815e6d510d39
3
+ size 821
runs/train/p1/events.out.tfevents.1658782266.nefgpu54.32401.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e43bf47fdcca2a060107d13f5519364caa48fbc21da07fec23d4d7c2202e928
3
+ size 40
runs/train/p1/events.out.tfevents.1658814681.nefgpu54.80450.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28f1dd62d5aa1b4a4f1aebf1e3b9a195dcbaf3403684d15ae022645d734358cd
3
+ size 40
runs/train/p1/events.out.tfevents.1658822505.nefgpu54.33886.461.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:028a860984ba3a4c84fdceb9408e969930c186b9b1e609c8d944fd4549848f35
3
+ size 237640
runs/train/p1/events.out.tfevents.1658978757.nefgpu54.33349.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d53cf9d245fb7b4ce9b318701b924004b4315c6c1df5c56443f48ec2859a5f2
3
+ size 40
runs/train/p1/events.out.tfevents.1659006537.nefgpu54.32289.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a2831d6cc6060e6e01a9173ef21f2cd8388ff6e3a28be7ce52db5b7a97031de
3
+ size 40
runs/train/p1/events.out.tfevents.1659006973.nefgpu54.34472.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afc73cfd2913a9332bda9226fe81e408c3a15ec2f074b1397507d4fc3ef16924
3
+ size 269320
runs/train/p2/events.out.tfevents.1659133070.nefgpu54.1618.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e479289d46f17b0cd2af8c4d9210b70e5cf5c0b4b8c8f9fa9f288aa7fcec6e8
3
+ size 197908
runs/train/p2/events.out.tfevents.1659427595.nefgpu54.33725.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b704c973c7e26b0b484341b42cf4af42bd34d4de43950a8ef1d11568876f95d3
3
+ size 14296
runs/train/p2/events.out.tfevents.1659993621.nefgpu54.15278.463.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e87c83148232ebd9dd3a75bc41b0868bc93bf307a863c263959ab586c53623a2
3
+ size 42808
runs/training_summary.txt ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_training_steps": 3400,
3
+ "train_loss": 7.284830570220947,
4
+ "last_train_metrics_train_perf": 266.8984680175781,
5
+ "last_train_metrics_total_loss": 7.284830570220947,
6
+ "last_train_metrics_masked_lm_accuracy": 0.6975675821304321,
7
+ "last_train_metrics_masked_lm_loss": 1.4581712484359741,
8
+ "last_train_metrics_sampled_masked_lm_accuracy": 0.6167887449264526,
9
+ "last_train_metrics_disc_loss": 0.1190570518374443,
10
+ "last_train_metrics_disc_auc": 0.0,
11
+ "last_train_metrics_disc_accuracy": 0.960852861404419,
12
+ "last_train_metrics_disc_precision": 0.7765898704528809,
13
+ "last_train_metrics_disc_recall": 0.38164031505584717,
14
+ "eval_metrics_train_perf": 0.0,
15
+ "eval_metrics_total_loss": 0.0,
16
+ "eval_metrics_masked_lm_accuracy": 0.0,
17
+ "eval_metrics_masked_lm_loss": 0.0,
18
+ "eval_metrics_sampled_masked_lm_accuracy": 0.0,
19
+ "eval_metrics_disc_loss": 0.0,
20
+ "eval_metrics_disc_auc": 0.0,
21
+ "eval_metrics_disc_accuracy": 0.0,
22
+ "eval_metrics_disc_precision": 0.0,
23
+ "eval_metrics_disc_recall": 0.0
24
+ }
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "[CLS]", "eos_token": "[SEP]", "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eaf6658b4c5f33b1a6092e07deec6f921e4c6e87bf3068d109a2f1fd44849b50
3
+ size 808787
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_lower_case": false,
3
+ "bos_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "unk_token": "[UNK]",
6
+ "sep_token": "[SEP]",
7
+ "pad_token": "[PAD]",
8
+ "cls_token": "[CLS]",
9
+ "mask_token": "[MASK]",
10
+ "split_by_punct": false,
11
+ "special_tokens_map_file": null,
12
+ "name_or_path": "vocab/camembert-deberta/",
13
+ "sp_model_kwargs": {},
14
+ "tokenizer_file": null,
15
+ "tokenizer_class": "DebertaV2Tokenizer",
16
+ "vocab_type": "spm"
17
+ }