wissamantoun commited on May 3, 2023

Commit

7cc35b4

•

1 Parent(s): 65a1089

added model files

Browse files

Files changed (33) hide show

.gitattributes +1 -0
README.md +101 -0
added_tokens.json +1 -0
config.json +38 -0
runs/.gitkeep +0 -0
runs/eval/events.out.tfevents.1658662619.nefgpu54.32602.453.v2 +3 -0
runs/eval/events.out.tfevents.1658780993.nefgpu54.28912.453.v2 +3 -0
runs/eval/events.out.tfevents.1658782266.nefgpu54.32401.453.v2 +3 -0
runs/eval/events.out.tfevents.1658814681.nefgpu54.80450.453.v2 +3 -0
runs/eval/events.out.tfevents.1658822505.nefgpu54.33886.453.v2 +3 -0
runs/eval/events.out.tfevents.1658978757.nefgpu54.33349.455.v2 +3 -0
runs/eval/events.out.tfevents.1659006537.nefgpu54.32289.455.v2 +3 -0
runs/eval/events.out.tfevents.1659006973.nefgpu54.34472.455.v2 +3 -0
runs/eval/events.out.tfevents.1659133070.nefgpu54.1618.455.v2 +3 -0
runs/eval/events.out.tfevents.1659427595.nefgpu54.33725.455.v2 +3 -0
runs/eval/events.out.tfevents.1659993621.nefgpu54.15278.455.v2 +3 -0
runs/train/p1/events.out.tfevents.1658662619.nefgpu54.32602.461.v2 +3 -0
runs/train/p1/events.out.tfevents.1658780993.nefgpu54.28912.461.v2 +3 -0
runs/train/p1/events.out.tfevents.1658782266.nefgpu54.32401.461.v2 +3 -0
runs/train/p1/events.out.tfevents.1658814681.nefgpu54.80450.461.v2 +3 -0
runs/train/p1/events.out.tfevents.1658822505.nefgpu54.33886.461.v2 +3 -0
runs/train/p1/events.out.tfevents.1658978757.nefgpu54.33349.463.v2 +3 -0
runs/train/p1/events.out.tfevents.1659006537.nefgpu54.32289.463.v2 +3 -0
runs/train/p1/events.out.tfevents.1659006973.nefgpu54.34472.463.v2 +3 -0
runs/train/p2/events.out.tfevents.1659133070.nefgpu54.1618.463.v2 +3 -0
runs/train/p2/events.out.tfevents.1659427595.nefgpu54.33725.463.v2 +3 -0
runs/train/p2/events.out.tfevents.1659993621.nefgpu54.15278.463.v2 +3 -0
runs/training_summary.txt +24 -0
special_tokens_map.json +1 -0
spm.model +3 -0
tf_model.h5 +3 -0
tokenizer.json +0 -0
tokenizer_config.json +16 -0

.gitattributes CHANGED Viewed

@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.ckpt-* filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,104 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language: fr
+datasets:
+    - ccnet
+tags:
+  - deberta
+  - deberta-v3
 ---
+# CamemBERTa: A French language model based on DeBERTa V3
+CamemBERTa, a French language model based on DeBERTa V3, which is a DeBerta V2 with ELECTRA style pretraining using the Replaced Token Detection (RTD) objective.
+RTD uses a generator model, trained using the MLM objective, to replace masked tokens with plausible candidates, and a discriminator model trained to detect which tokens were replaced by the generator.
+Usually the generator and discriminator share the same embedding matrix, but the authors of DeBERTa V3 propose a new technique to disentagle the gradients of the shared embedding between the generator and discriminator called gradient-disentangled embedding sharing (GDES)
+*This the first publicly available implementation of DeBERTa V3, and the first publicly DeBERTaV3 model outside of the original Microsoft release.*
+Preprint Paper: https://inria.hal.science/hal-03963729/
+Pre-training Code: https://gitlab.inria.fr/almanach/CamemBERTa
+## How to use CamemBERTa
+Our pretrained weights are available on the HuggingFace model hub, you can load them using the following code:
+```python
+from transformers import AutoTokenizer, AutoModel, AutoModelForMaskedLM
+CamemBERTa = AutoModel.from_pretrained("almanach/camemberta-base")
+tokenizer = AutoTokenizer.from_pretrained("almanach/camemberta-base")
+CamemBERTa_gen = AutoModelForMaskedLM.from_pretrained("almanach/camemberta-base-generator")
+tokenizer_gen = AutoTokenizer.from_pretrained("almanach/camemberta-base-generator")
+```
+We also include the TF2 weights including the weights for the model's RTD head for the discriminator, and the MLM head for the generator.
+CamemBERTa is compatible with most finetuning scripts from the transformers library.
+## Pretraining Setup
+The model was trained on the French subset of the CCNet corpus (the same subset used in CamemBERT and PaGNOL) and is available on the HuggingFace model hub: CamemBERTa and CamemBERTa Generator.
+To speed up the pre-training experiments, the pre-training was split into two phases;
+in phase 1, the model is trained with a maximum sequence length of 128 tokens for 10,000 steps with 2,000 warm-up steps and a very large batch size of 67,584.
+In phase 2, maximum sequence length is increased to the full model capacity of 512 tokens for 3,300 steps with 200 warm-up steps and a batch size of 27,648.
+The model would have seen 133B tokens compared to 419B tokens for CamemBERT-CCNet which was trained for 100K steps, this represents roughly 30% of CamemBERT’s full training.
+To have a fair comparison, we trained a RoBERTa model, CamemBERT30%, using the same exact pretraining setup but with the MLM objective.
+## Pretraining Loss Curves
+check the tensorboard logs and plots
+## Fine-tuning results
+Datasets: POS tagging and Dependency Parsing (GSD, Rhapsodie, Sequoia, FSMB), NER (FTB), the FLUE benchmark (XNLI, CLS, PAWS-X), and the French Question Answering Dataset (FQuAD)
+| Model             | UPOS      | LAS       | NER       | CLS       | PAWS-X    | XNLI      | F1 (FQuAD) | EM (FQuAD) |
+|-------------------|-----------|-----------|-----------|-----------|-----------|-----------|------------|------------|
+| CamemBERT (CCNet) | **97.59** | **88.69** | 89.97     | 94.62     | 91.36     | 81.95     | 80.98      | **62.51**  |
+| CamemBERT (30%)   | 97.53     | 87.98     | **91.04** | 93.28     | 88.94     | 79.89     | 75.14      | 56.19      |
+| CamemBERTa        | 97.57     | 88.55     | 90.33     | **94.92** | **91.67** | **82.00** | **81.15**  | 62.01      |
+The following table compares CamemBERTa's performance on XNLI against other models under different training setups, which demonstrates the data efficiency of CamemBERTa.
+| Model             | XNLI (Acc.) | Training Steps | Tokens seen in pre-training | Dataset Size in Tokens |
+|-------------------|-------------|----------------|-----------------------------|------------------------|
+| mDeBERTa          | 84.4        | 500k           | 2T                          | 2.5T                   |
+| CamemBERTa        | 82.0        | 33k            | 0.139T                      | 0.319T                 |
+| XLM-R             | 81.4        | 1.5M           | 6T                          | 2.5T                   |
+| CamemBERT - CCNet | 81.95       | 100k           | 0.419T                      | 0.319T                 |
+*Note: The CamemBERTa training steps was adjusted for a batch size of 8192.*
+## License
+The public model weights are licensed under MIT License.
+This code is licensed under the Apache License 2.0.
+## Citation
+Paper accepted to Findings of ACL 2023.
+You can use the preprint citation for now
+```
+@article{antoun2023camemberta
+  TITLE = {{Data-Efficient French Language Modeling with CamemBERTa}},
+  AUTHOR = {Antoun, Wissam and Sagot, Beno{\^i}t and Seddah, Djam{\'e}},
+  URL = {https://inria.hal.science/hal-03963729},
+  NOTE = {working paper or preprint},
+  YEAR = {2023},
+  MONTH = Jan,
+  PDF = {https://inria.hal.science/hal-03963729/file/French_DeBERTa___ACL_2023%20to%20be%20uploaded.pdf},
+  HAL_ID = {hal-03963729},
+  HAL_VERSION = {v1},
+}
+```
+## Contact
+Wissam Antoun: `wissam (dot) antoun (at) inria (dot) fr`
+Benoit Sagot: `benoit (dot) sagot (at) inria (dot) fr`
+Djame Seddah: `djame (dot) seddah (at) inria (dot) fr`

added_tokens.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"[UNK]": 32001, "[PAD]": 32002}

config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "amp": true,
+  "architectures": [
+    "DebertaV2ForMaskedLM"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "conv_act": "gelu",
+  "conv_kernel_size": 3,
+  "embedding_size": 768,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 256,
+  "initializer_range": 0.02,
+  "intermediate_size": 1024,
+  "layer_norm_eps": 1e-07,
+  "max_position_embeddings": 512,
+  "max_relative_positions": -1,
+  "model_name": "camemberta-base",
+  "model_type": "deberta-v2",
+  "norm_rel_ebd": "layer_norm",
+  "num_attention_heads": 4,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "pooler_dropout": 0,
+  "pooler_hidden_act": "gelu",
+  "pooler_hidden_size": 768,
+  "pos_att_type": [
+    "p2c",
+    "c2p"
+  ],
+  "position_biased_input": false,
+  "position_buckets": 256,
+  "relative_attention": true,
+  "share_att_key": true,
+  "transformers_version": "4.18.0.dev0",
+  "type_vocab_size": 0,
+  "vocab_size": 32008
+}

runs/.gitkeep ADDED Viewed

File without changes

runs/eval/events.out.tfevents.1658662619.nefgpu54.32602.453.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abff33d04db4af03dd3c8fd067478e73a80d56dc326f7372a9cc18e8a776dc96
+size 40

runs/eval/events.out.tfevents.1658780993.nefgpu54.28912.453.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:077bf001772d30e10dc5a0a725f4b82e27b4871249297630e8e6d80744c67cb7
+size 40

runs/eval/events.out.tfevents.1658782266.nefgpu54.32401.453.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e43bf47fdcca2a060107d13f5519364caa48fbc21da07fec23d4d7c2202e928
+size 40

runs/eval/events.out.tfevents.1658814681.nefgpu54.80450.453.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28f1dd62d5aa1b4a4f1aebf1e3b9a195dcbaf3403684d15ae022645d734358cd
+size 40

runs/eval/events.out.tfevents.1658822505.nefgpu54.33886.453.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e60ab04655c96abd22edca70f5f3a9b07ae7a2eba4be9064968339b2bc9f047
+size 40

runs/eval/events.out.tfevents.1658978757.nefgpu54.33349.455.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d53cf9d245fb7b4ce9b318701b924004b4315c6c1df5c56443f48ec2859a5f2
+size 40

runs/eval/events.out.tfevents.1659006537.nefgpu54.32289.455.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1a2831d6cc6060e6e01a9173ef21f2cd8388ff6e3a28be7ce52db5b7a97031de
+size 40

runs/eval/events.out.tfevents.1659006973.nefgpu54.34472.455.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e4126a80bf4195601d8ac0494a2d13bea3200698e79fbe31b58888fad2fd578
+size 40

runs/eval/events.out.tfevents.1659133070.nefgpu54.1618.455.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:263099010447ac4ef8020ef5df72b0070baaeee41e257191ac5ed9310e73d5fd
+size 40

runs/eval/events.out.tfevents.1659427595.nefgpu54.33725.455.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6412c40d4909dfccca3144c46e5bc815af8d12955cffd16610259230abe68f39
+size 40

runs/eval/events.out.tfevents.1659993621.nefgpu54.15278.455.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84a318839fe4bf4bc681eae19c894f0cdc49a532d70a60aadf760fdaf3f0b544
+size 40

runs/train/p1/events.out.tfevents.1658662619.nefgpu54.32602.461.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7902a3b23f6f324bb68e9fcac5172283e35e8b8366d39a26f790f33d9f164161
+size 237508

runs/train/p1/events.out.tfevents.1658780993.nefgpu54.28912.461.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e42ed2c1aebb7df82aa3eea4a4e3fcf4b739cf29d01ea36f0546815e6d510d39
+size 821

runs/train/p1/events.out.tfevents.1658782266.nefgpu54.32401.461.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e43bf47fdcca2a060107d13f5519364caa48fbc21da07fec23d4d7c2202e928
+size 40

runs/train/p1/events.out.tfevents.1658814681.nefgpu54.80450.461.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28f1dd62d5aa1b4a4f1aebf1e3b9a195dcbaf3403684d15ae022645d734358cd
+size 40

runs/train/p1/events.out.tfevents.1658822505.nefgpu54.33886.461.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:028a860984ba3a4c84fdceb9408e969930c186b9b1e609c8d944fd4549848f35
+size 237640

runs/train/p1/events.out.tfevents.1658978757.nefgpu54.33349.463.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d53cf9d245fb7b4ce9b318701b924004b4315c6c1df5c56443f48ec2859a5f2
+size 40

runs/train/p1/events.out.tfevents.1659006537.nefgpu54.32289.463.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1a2831d6cc6060e6e01a9173ef21f2cd8388ff6e3a28be7ce52db5b7a97031de
+size 40

runs/train/p1/events.out.tfevents.1659006973.nefgpu54.34472.463.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:afc73cfd2913a9332bda9226fe81e408c3a15ec2f074b1397507d4fc3ef16924
+size 269320

runs/train/p2/events.out.tfevents.1659133070.nefgpu54.1618.463.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e479289d46f17b0cd2af8c4d9210b70e5cf5c0b4b8c8f9fa9f288aa7fcec6e8
+size 197908

runs/train/p2/events.out.tfevents.1659427595.nefgpu54.33725.463.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b704c973c7e26b0b484341b42cf4af42bd34d4de43950a8ef1d11568876f95d3
+size 14296

runs/train/p2/events.out.tfevents.1659993621.nefgpu54.15278.463.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e87c83148232ebd9dd3a75bc41b0868bc93bf307a863c263959ab586c53623a2
+size 42808

runs/training_summary.txt ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "total_training_steps": 3400,
+    "train_loss": 7.284830570220947,
+    "last_train_metrics_train_perf": 266.8984680175781,
+    "last_train_metrics_total_loss": 7.284830570220947,
+    "last_train_metrics_masked_lm_accuracy": 0.6975675821304321,
+    "last_train_metrics_masked_lm_loss": 1.4581712484359741,
+    "last_train_metrics_sampled_masked_lm_accuracy": 0.6167887449264526,
+    "last_train_metrics_disc_loss": 0.1190570518374443,
+    "last_train_metrics_disc_auc": 0.0,
+    "last_train_metrics_disc_accuracy": 0.960852861404419,
+    "last_train_metrics_disc_precision": 0.7765898704528809,
+    "last_train_metrics_disc_recall": 0.38164031505584717,
+    "eval_metrics_train_perf": 0.0,
+    "eval_metrics_total_loss": 0.0,
+    "eval_metrics_masked_lm_accuracy": 0.0,
+    "eval_metrics_masked_lm_loss": 0.0,
+    "eval_metrics_sampled_masked_lm_accuracy": 0.0,
+    "eval_metrics_disc_loss": 0.0,
+    "eval_metrics_disc_auc": 0.0,
+    "eval_metrics_disc_accuracy": 0.0,
+    "eval_metrics_disc_precision": 0.0,
+    "eval_metrics_disc_recall": 0.0
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"bos_token": "[CLS]", "eos_token": "[SEP]", "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

spm.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eaf6658b4c5f33b1a6092e07deec6f921e4c6e87bf3068d109a2f1fd44849b50
+size 808787

tf_model.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d025d27d13d2430bd1857bf8198a5f1dab0a1ee41a95eacf25fbc13c5afcb45a
+size 238688520

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+    "do_lower_case": false,
+    "bos_token": "[CLS]",
+    "eos_token": "[SEP]",
+    "unk_token": "[UNK]",
+    "sep_token": "[SEP]",
+    "pad_token": "[PAD]",
+    "cls_token": "[CLS]",
+    "mask_token": "[MASK]",
+    "split_by_punct": false,
+    "special_tokens_map_file": null,
+    "name_or_path": "vocab/camembert-deberta/",
+    "sp_model_kwargs": {},
+    "tokenizer_file": null,
+    "tokenizer_class": "DebertaV2Tokenizer"
+}