wissamantoun commited on
Commit
f4d93a6
1 Parent(s): 3a11161

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - roberta
6
+ - token-classification
7
+ base_model: almanach/camembertv2-base
8
+ datasets:
9
+ - GSD
10
+ metrics:
11
+ - las
12
+ - upos
13
+ model-index:
14
+ - name: almanach/camembertv2-base-gsd
15
+ results:
16
+ - task:
17
+ type: token-classification
18
+ name: Part-of-Speech Tagging
19
+ dataset:
20
+ type: GSD
21
+ name: GSD
22
+ metrics:
23
+ - name: upos
24
+ type: upos
25
+ value: 0.98662
26
+ verified: false
27
+ - task:
28
+ type: token-classification
29
+ name: Dependency Parsing
30
+ dataset:
31
+ type: GSD
32
+ name: GSD
33
+ metrics:
34
+ - name: las
35
+ type: las
36
+ value: 0.94317
37
+ verified: false
38
+ ---
39
+
40
+ # Model Card for almanach/camembertv2-base-gsd
41
+
42
+ almanach/camembertv2-base-gsd is a roberta model for token classification. It is trained on the GSD dataset for the task of Part-of-Speech Tagging and Dependency Parsing.
43
+ The model achieves an f1 score of on the GSD dataset.
44
+
45
+ The model is part of the almanach/camembertv2-base family of model finetunes.
46
+
47
+ ## Model Details
48
+
49
+ ### Model Description
50
+
51
+ - **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris)
52
+ - **Model type:** roberta
53
+ - **Language(s) (NLP):** French
54
+ - **License:** MIT
55
+ - **Finetuned from model :** almanach/camembertv2-base
56
+
57
+ ### Model Sources
58
+
59
+ <!-- Provide the basic links for the model. -->
60
+
61
+ - **Repository:** https://github.com/WissamAntoun/camemberta
62
+ - **Paper:** https://arxiv.org/abs/2411.08868
63
+
64
+ ## Uses
65
+
66
+ The model can be used for token classification tasks in French for Part-of-Speech Tagging and Dependency Parsing.
67
+
68
+ ## Bias, Risks, and Limitations
69
+
70
+ The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.
71
+
72
+
73
+ ## How to Get Started with the Model
74
+
75
+ You can use the models directly with the hopsparser library in server mode https://github.com/hopsparser/hopsparser/blob/main/docs/server.md
76
+
77
+
78
+ ## Training Details
79
+
80
+ ### Training Procedure
81
+
82
+ Model trained with the [hopsparser](https://github.com/hopsparser/hopsparser) library on the GSD dataset.
83
+
84
+
85
+ #### Training Hyperparameters
86
+
87
+ ```yml
88
+ # Layer dimensions
89
+ mlp_input: 1024
90
+ mlp_tag_hidden: 16
91
+ mlp_arc_hidden: 512
92
+ mlp_lab_hidden: 128
93
+ # Lexers
94
+ lexers:
95
+ - name: word_embeddings
96
+ type: words
97
+ embedding_size: 256
98
+ word_dropout: 0.5
99
+ - name: char_level_embeddings
100
+ type: chars_rnn
101
+ embedding_size: 64
102
+ lstm_output_size: 128
103
+ - name: fasttext
104
+ type: fasttext
105
+ - name: camembertv2_base_p2_17k_last_layer
106
+ type: bert
107
+ model: /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/
108
+ layers: [11]
109
+ subwords_reduction: "mean"
110
+ # Training hyperparameters
111
+ encoder_dropout: 0.5
112
+ mlp_dropout: 0.5
113
+ batch_size: 8
114
+ epochs: 64
115
+ lr:
116
+ base: 0.00003
117
+ schedule:
118
+ shape: linear
119
+ warmup_steps: 100
120
+
121
+ ```
122
+
123
+ #### Results
124
+
125
+ **UPOS:** 0.98662
126
+ **LAS:** 0.94317
127
+
128
+ ## Technical Specifications
129
+
130
+ ### Model Architecture and Objective
131
+
132
+ roberta custom model for token classification.
133
+
134
+ ## Citation
135
+
136
+ **BibTeX:**
137
+
138
+ ```bibtex
139
+ @misc{antoun2024camembert20smarterfrench,
140
+ title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
141
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
142
+ year={2024},
143
+ eprint={2411.08868},
144
+ archivePrefix={arXiv},
145
+ primaryClass={cs.CL},
146
+ url={https://arxiv.org/abs/2411.08868},
147
+ }
148
+
149
+ @inproceedings{grobol:hal-03223424,
150
+ title = {Analyse en dépendances du français avec des plongements contextualisés},
151
+ author = {Grobol, Loïc and Crabbé, Benoît},
152
+ url = {https://hal.archives-ouvertes.fr/hal-03223424},
153
+ booktitle = {Actes de la 28ème Conférence sur le Traitement Automatique des Langues Naturelles},
154
+ eventtitle = {TALN-RÉCITAL 2021},
155
+ venue = {Lille, France},
156
+ pdf = {https://hal.archives-ouvertes.fr/hal-03223424/file/HOPS_final.pdf},
157
+ hal_id = {hal-03223424},
158
+ hal_version = {v1},
159
+ }
160
+
161
+ ```
camembertv2_base_p2_17k_last_layer.yaml ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Layer dimensions
2
+ mlp_input: 1024
3
+ mlp_tag_hidden: 16
4
+ mlp_arc_hidden: 512
5
+ mlp_lab_hidden: 128
6
+ # Lexers
7
+ lexers:
8
+ - name: word_embeddings
9
+ type: words
10
+ embedding_size: 256
11
+ word_dropout: 0.5
12
+ - name: char_level_embeddings
13
+ type: chars_rnn
14
+ embedding_size: 64
15
+ lstm_output_size: 128
16
+ - name: fasttext
17
+ type: fasttext
18
+ - name: camembertv2_base_p2_17k_last_layer
19
+ type: bert
20
+ model: /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/
21
+ layers: [11]
22
+ subwords_reduction: "mean"
23
+ # Training hyperparameters
24
+ encoder_dropout: 0.5
25
+ mlp_dropout: 0.5
26
+ batch_size: 8
27
+ epochs: 64
28
+ lr:
29
+ base: 0.00003
30
+ schedule:
31
+ shape: linear
32
+ warmup_steps: 100
fr_gsd-ud-dev.parsed.conllu ADDED
The diff for this file is too large to render. See raw diff
 
fr_gsd-ud-test.parsed.conllu ADDED
The diff for this file is too large to render. See raw diff
 
model/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"mlp_input": 1024, "mlp_tag_hidden": 16, "mlp_arc_hidden": 512, "mlp_lab_hidden": 128, "biased_biaffine": true, "default_batch_size": 8, "encoder_dropout": 0.5, "extra_annotations": {}, "labels": ["acl", "acl:relcl", "advcl", "advcl:cleft", "advmod", "amod", "appos", "aux:caus", "aux:pass", "aux:tense", "case", "cc", "ccomp", "compound", "conj", "cop", "csubj", "csubj:pass", "dep", "dep:comp", "det", "discourse", "dislocated", "expl", "expl:pass", "expl:pv", "expl:subj", "fixed", "flat", "flat:foreign", "flat:name", "goeswith", "iobj", "iobj:agent", "mark", "nmod", "nsubj", "nsubj:caus", "nsubj:pass", "nummod", "obj", "obj:agent", "obj:lvc", "obl", "obl:agent", "obl:arg", "obl:mod", "orphan", "parataxis", "punct", "reparandum", "root", "vocative", "xcomp"], "mlp_dropout": 0.5, "tagset": ["ADJ", "ADP", "ADV", "AUX", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PRON", "PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X"], "lexers": {"word_embeddings": "words", "char_level_embeddings": "chars_rnn", "fasttext": "fasttext", "camembertv2_base_p2_17k_last_layer": "bert"}, "multitask_loss": "sum"}
model/lexers/camembertv2_base_p2_17k_last_layer/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"layers": [11], "subwords_reduction": "mean", "weight_layers": false}
model/lexers/camembertv2_base_p2_17k_last_layer/model/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/",
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "embedding_size": 768,
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-07,
17
+ "max_position_embeddings": 1025,
18
+ "model_name": "camembertv2-base-bf16",
19
+ "model_type": "roberta",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 0,
23
+ "position_biased_input": true,
24
+ "position_embedding_type": "absolute",
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.44.2",
27
+ "type_vocab_size": 1,
28
+ "use_cache": true,
29
+ "vocab_size": 32768
30
+ }
model/lexers/camembertv2_base_p2_17k_last_layer/model/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
model/lexers/camembertv2_base_p2_17k_last_layer/model/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
model/lexers/camembertv2_base_p2_17k_last_layer/model/tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "[CLS]",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "[CLS]",
48
+ "eos_token": "[SEP]",
49
+ "errors": "replace",
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "unk_token": "[UNK]"
57
+ }
model/lexers/char_level_embeddings/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"char_embeddings_dim": 64, "output_dim": 128, "special_tokens": ["<root>"], "charset": ["<pad>", "<special>", " ", "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "\u00a3", "\u00ab", "\u00b0", "\u00b1", "\u00b2", "\u00b3", "\u00b7", "\u00ba", "\u00bb", "\u00c0", "\u00c1", "\u00c2", "\u00c5", "\u00c6", "\u00c7", "\u00c8", "\u00c9", "\u00ca", "\u00cd", "\u00ce", "\u00d3", "\u00d4", "\u00d6", "\u00d7", "\u00d9", "\u00da", "\u00dc", "\u00df", "\u00e0", "\u00e1", "\u00e2", "\u00e3", "\u00e4", "\u00e5", "\u00e6", "\u00e7", "\u00e8", "\u00e9", "\u00ea", "\u00eb", "\u00ec", "\u00ed", "\u00ee", "\u00ef", "\u00f0", "\u00f1", "\u00f2", "\u00f3", "\u00f4", "\u00f6", "\u00f8", "\u00f9", "\u00fa", "\u00fb", "\u00fc", "\u00fd", "\u00ff", "\u0101", "\u0103", "\u0105", "\u0107", "\u010c", "\u010d", "\u0119", "\u011b", "\u011f", "\u0123", "\u012b", "\u012d", "\u0131", "\u013d", "\u013e", "\u0141", "\u0142", "\u0144", "\u0148", "\u014c", "\u014d", "\u0151", "\u0153", "\u0159", "\u015b", "\u015f", "\u0160", "\u0161", "\u0163", "\u0169", "\u016b", "\u017b", "\u017c", "\u017d", "\u017e", "\u01b0", "\u025f", "\u0268", "\u0274", "\u0282", "\u02bf", "\u0301", "\u0361", "\u03a9", "\u03b3", "\u03b5", "\u03c9", "\u0409", "\u040f", "\u0410", "\u0411", "\u0412", "\u0413", "\u0414", "\u0418", "\u041b", "\u041c", "\u041e", "\u041f", "\u0420", "\u0421", "\u0422", "\u0424", "\u0428", "\u0430", "\u0431", "\u0432", "\u0433", "\u0434", "\u0435", "\u0436", "\u0437", "\u0438", "\u0439", "\u043a", "\u043b", "\u043c", "\u043d", "\u043e", "\u043f", "\u0440", "\u0441", "\u0442", "\u0443", "\u0445", "\u0446", "\u0447", "\u0448", "\u0449", "\u044a", "\u044c", "\u044f", "\u0451", "\u0458", "\u0459", "\u045a", "\u045b", "\u0627", "\u062c", "\u062f", "\u0630", "\u0631", "\u0634", "\u0643", "\u0644", "\u0645", "\u0646", "\u1e0f", "\u1e25", "\u1e92", "\u1ea3", "\u1ead", "\u1ec5", "\u1edd", "\u1edf", "\u1ee7", "\u1ef1", "\u2013", "\u2014", "\u2020", "\u2032", "\u2082", "\u20ac", "\u2212", "\u25b6", "\u4e0a", "\u4e2d", "\u4e34", "\u4e49", "\u4e59", "\u4e95", "\u4ecb", "\u4f0e", "\u5247", "\u53f7", "\u56db", "\u5712", "\u5927", "\u5b89", "\u5bae", "\u5bbf", "\u5f81", "\u614e", "\u697d", "\u6d4e", "\u706b", "\u7384", "\u7530", "\u753a", "\u7bad", "\u7c89", "\u80e1", "\u82a6", "\u85e9", "\u898f", "\u90e8", "\u957f", "\uac15", "\uc131", "\ud638"]}
model/lexers/fasttext/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"special_tokens": ["<root>"]}
model/lexers/fasttext/fasttext_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58d2303aea9428dc9ce793512d8d164f54b8947662eeb44655b24a35d8b2f5bd
3
+ size 805269874
model/lexers/word_embeddings/config.json ADDED
The diff for this file is too large to render. See raw diff
 
model/weights.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1833f89de154a052d34875e260df9a0074a4645ec433b7089f075e801aa9d03
3
+ size 1815845162
train.log ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [hops] 2024-09-23 22:03:10.261 | INFO | Initializing a parser from /workspace/configs/exp_camembertv2/camembertv2_base_p2_17k_last_layer.yaml
2
+ [hops] 2024-09-23 22:03:10.554 | INFO | Generating a FastText model from the treebank
3
+ [hops] 2024-09-23 22:03:10.645 | INFO | Training fasttext model
4
+ [hops] 2024-09-23 22:03:12.421 | WARNING | Some weights of RobertaModel were not initialized from the model checkpoint at /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
5
+ You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
6
+ [hops] 2024-09-23 22:03:24.938 | INFO | Start training on cuda:0
7
+ [hops] 2024-09-23 22:03:24.944 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
8
+ [hops] 2024-09-23 22:04:54.185 | INFO | Epoch 0: train loss 1.1740 dev loss 0.3235 dev tag acc 93.16% dev head acc 88.90% dev deprel acc 91.95%
9
+ [hops] 2024-09-23 22:04:54.186 | INFO | New best model: head accuracy 88.90% > 0.00%
10
+ [hops] 2024-09-23 22:06:23.342 | INFO | Epoch 1: train loss 0.2838 dev loss 0.1626 dev tag acc 97.85% dev head acc 94.35% dev deprel acc 95.43%
11
+ [hops] 2024-09-23 22:06:23.343 | INFO | New best model: head accuracy 94.35% > 88.90%
12
+ [hops] 2024-09-23 22:07:57.103 | INFO | Epoch 2: train loss 0.1673 dev loss 0.1337 dev tag acc 98.24% dev head acc 95.40% dev deprel acc 96.71%
13
+ [hops] 2024-09-23 22:07:57.104 | INFO | New best model: head accuracy 95.40% > 94.35%
14
+ [hops] 2024-09-23 22:09:24.832 | INFO | Epoch 3: train loss 0.1227 dev loss 0.1438 dev tag acc 98.43% dev head acc 95.90% dev deprel acc 96.81%
15
+ [hops] 2024-09-23 22:09:24.833 | INFO | New best model: head accuracy 95.90% > 95.40%
16
+ [hops] 2024-09-23 22:10:55.835 | INFO | Epoch 4: train loss 0.0968 dev loss 0.1418 dev tag acc 98.43% dev head acc 96.14% dev deprel acc 97.20%
17
+ [hops] 2024-09-23 22:10:55.836 | INFO | New best model: head accuracy 96.14% > 95.90%
18
+ [hops] 2024-09-23 22:12:27.472 | INFO | Epoch 5: train loss 0.0793 dev loss 0.1568 dev tag acc 98.52% dev head acc 96.19% dev deprel acc 97.34%
19
+ [hops] 2024-09-23 22:12:27.473 | INFO | New best model: head accuracy 96.19% > 96.14%
20
+ [hops] 2024-09-23 22:13:56.231 | INFO | Epoch 6: train loss 0.0667 dev loss 0.1516 dev tag acc 98.57% dev head acc 96.40% dev deprel acc 97.36%
21
+ [hops] 2024-09-23 22:13:56.232 | INFO | New best model: head accuracy 96.40% > 96.19%
22
+ [hops] 2024-09-23 22:15:26.147 | INFO | Epoch 7: train loss 0.0566 dev loss 0.1687 dev tag acc 98.58% dev head acc 96.52% dev deprel acc 97.52%
23
+ [hops] 2024-09-23 22:15:26.148 | INFO | New best model: head accuracy 96.52% > 96.40%
24
+ [hops] 2024-09-23 22:16:55.419 | INFO | Epoch 8: train loss 0.0500 dev loss 0.1826 dev tag acc 98.64% dev head acc 96.53% dev deprel acc 97.42%
25
+ [hops] 2024-09-23 22:16:55.420 | INFO | New best model: head accuracy 96.53% > 96.52%
26
+ [hops] 2024-09-23 22:18:28.485 | INFO | Epoch 9: train loss 0.0425 dev loss 0.1906 dev tag acc 98.59% dev head acc 96.56% dev deprel acc 97.56%
27
+ [hops] 2024-09-23 22:18:28.486 | INFO | New best model: head accuracy 96.56% > 96.53%
28
+ [hops] 2024-09-23 22:20:02.390 | INFO | Epoch 10: train loss 0.0377 dev loss 0.2151 dev tag acc 98.55% dev head acc 96.60% dev deprel acc 97.51%
29
+ [hops] 2024-09-23 22:20:02.390 | INFO | New best model: head accuracy 96.60% > 96.56%
30
+ [hops] 2024-09-23 22:21:31.071 | INFO | Epoch 11: train loss 0.0332 dev loss 0.2276 dev tag acc 98.59% dev head acc 96.62% dev deprel acc 97.53%
31
+ [hops] 2024-09-23 22:21:31.072 | INFO | New best model: head accuracy 96.62% > 96.60%
32
+ [hops] 2024-09-23 22:22:58.744 | INFO | Epoch 12: train loss 0.0299 dev loss 0.2397 dev tag acc 98.59% dev head acc 96.62% dev deprel acc 97.49%
33
+ [hops] 2024-09-23 22:22:58.745 | INFO | New best model: head accuracy 96.62% > 96.62%
34
+ [hops] 2024-09-23 22:24:30.195 | INFO | Epoch 13: train loss 0.0270 dev loss 0.2548 dev tag acc 98.64% dev head acc 96.45% dev deprel acc 97.61%
35
+ [hops] 2024-09-23 22:25:58.937 | INFO | Epoch 14: train loss 0.0247 dev loss 0.2351 dev tag acc 98.69% dev head acc 96.51% dev deprel acc 97.60%
36
+ [hops] 2024-09-23 22:27:26.485 | INFO | Epoch 15: train loss 0.0219 dev loss 0.2812 dev tag acc 98.64% dev head acc 96.60% dev deprel acc 97.63%
37
+ [hops] 2024-09-23 22:28:53.871 | INFO | Epoch 16: train loss 0.0204 dev loss 0.2771 dev tag acc 98.64% dev head acc 96.70% dev deprel acc 97.59%
38
+ [hops] 2024-09-23 22:28:53.872 | INFO | New best model: head accuracy 96.70% > 96.62%
39
+ [hops] 2024-09-23 22:30:22.009 | INFO | Epoch 17: train loss 0.0193 dev loss 0.2966 dev tag acc 98.57% dev head acc 96.71% dev deprel acc 97.54%
40
+ [hops] 2024-09-23 22:30:22.010 | INFO | New best model: head accuracy 96.71% > 96.70%
41
+ [hops] 2024-09-23 22:31:50.178 | INFO | Epoch 18: train loss 0.0172 dev loss 0.3181 dev tag acc 98.65% dev head acc 96.63% dev deprel acc 97.61%
42
+ [hops] 2024-09-23 22:33:18.205 | INFO | Epoch 19: train loss 0.0163 dev loss 0.3030 dev tag acc 98.66% dev head acc 96.73% dev deprel acc 97.62%
43
+ [hops] 2024-09-23 22:33:18.206 | INFO | New best model: head accuracy 96.73% > 96.71%
44
+ [hops] 2024-09-23 22:34:52.436 | INFO | Epoch 20: train loss 0.0150 dev loss 0.3732 dev tag acc 98.64% dev head acc 96.74% dev deprel acc 97.44%
45
+ [hops] 2024-09-23 22:34:52.437 | INFO | New best model: head accuracy 96.74% > 96.73%
46
+ [hops] 2024-09-23 22:36:26.028 | INFO | Epoch 21: train loss 0.0139 dev loss 0.3404 dev tag acc 98.59% dev head acc 96.74% dev deprel acc 97.57%
47
+ [hops] 2024-09-23 22:36:26.029 | INFO | New best model: head accuracy 96.74% > 96.74%
48
+ [hops] 2024-09-23 22:37:56.614 | INFO | Epoch 22: train loss 0.0130 dev loss 0.3795 dev tag acc 98.66% dev head acc 96.59% dev deprel acc 97.59%
49
+ [hops] 2024-09-23 22:39:24.997 | INFO | Epoch 23: train loss 0.0120 dev loss 0.3572 dev tag acc 98.70% dev head acc 96.67% dev deprel acc 97.71%
50
+ [hops] 2024-09-23 22:40:54.945 | INFO | Epoch 24: train loss 0.0114 dev loss 0.3795 dev tag acc 98.65% dev head acc 96.71% dev deprel acc 97.69%
51
+ [hops] 2024-09-23 22:42:25.287 | INFO | Epoch 25: train loss 0.0113 dev loss 0.3792 dev tag acc 98.57% dev head acc 96.60% dev deprel acc 97.59%
52
+ [hops] 2024-09-23 22:43:52.396 | INFO | Epoch 26: train loss 0.0105 dev loss 0.3807 dev tag acc 98.69% dev head acc 96.61% dev deprel acc 97.63%
53
+ [hops] 2024-09-23 22:45:20.429 | INFO | Epoch 27: train loss 0.0093 dev loss 0.4159 dev tag acc 98.66% dev head acc 96.71% dev deprel acc 97.65%
54
+ [hops] 2024-09-23 22:46:51.804 | INFO | Epoch 28: train loss 0.0088 dev loss 0.4024 dev tag acc 98.56% dev head acc 96.68% dev deprel acc 97.59%
55
+ [hops] 2024-09-23 22:48:21.306 | INFO | Epoch 29: train loss 0.0084 dev loss 0.4070 dev tag acc 98.58% dev head acc 96.69% dev deprel acc 97.66%
56
+ [hops] 2024-09-23 22:49:52.685 | INFO | Epoch 30: train loss 0.0085 dev loss 0.4418 dev tag acc 98.58% dev head acc 96.70% dev deprel acc 97.64%
57
+ [hops] 2024-09-23 22:51:21.719 | INFO | Epoch 31: train loss 0.0077 dev loss 0.4297 dev tag acc 98.62% dev head acc 96.67% dev deprel acc 97.66%
58
+ [hops] 2024-09-23 22:52:56.380 | INFO | Epoch 32: train loss 0.0070 dev loss 0.4392 dev tag acc 98.63% dev head acc 96.63% dev deprel acc 97.71%
59
+ [hops] 2024-09-23 22:54:24.344 | INFO | Epoch 33: train loss 0.0065 dev loss 0.5069 dev tag acc 98.69% dev head acc 96.65% dev deprel acc 97.61%
60
+ [hops] 2024-09-23 22:55:56.289 | INFO | Epoch 34: train loss 0.0066 dev loss 0.4738 dev tag acc 98.64% dev head acc 96.57% dev deprel acc 97.58%
61
+ [hops] 2024-09-23 22:57:26.001 | INFO | Epoch 35: train loss 0.0059 dev loss 0.4935 dev tag acc 98.60% dev head acc 96.62% dev deprel acc 97.57%
62
+ [hops] 2024-09-23 22:58:52.412 | INFO | Epoch 36: train loss 0.0056 dev loss 0.5007 dev tag acc 98.65% dev head acc 96.57% dev deprel acc 97.55%
63
+ [hops] 2024-09-23 23:00:21.973 | INFO | Epoch 37: train loss 0.0053 dev loss 0.5094 dev tag acc 98.60% dev head acc 96.71% dev deprel acc 97.54%
64
+ [hops] 2024-09-23 23:01:50.675 | INFO | Epoch 38: train loss 0.0051 dev loss 0.4747 dev tag acc 98.61% dev head acc 96.73% dev deprel acc 97.57%
65
+ [hops] 2024-09-23 23:03:21.971 | INFO | Epoch 39: train loss 0.0048 dev loss 0.5596 dev tag acc 98.65% dev head acc 96.73% dev deprel acc 97.65%
66
+ [hops] 2024-09-23 23:04:50.664 | INFO | Epoch 40: train loss 0.0043 dev loss 0.4880 dev tag acc 98.67% dev head acc 96.79% dev deprel acc 97.69%
67
+ [hops] 2024-09-23 23:04:50.665 | INFO | New best model: head accuracy 96.79% > 96.74%
68
+ [hops] 2024-09-23 23:06:18.898 | INFO | Epoch 41: train loss 0.0041 dev loss 0.5152 dev tag acc 98.69% dev head acc 96.68% dev deprel acc 97.65%
69
+ [hops] 2024-09-23 23:07:48.810 | INFO | Epoch 42: train loss 0.0042 dev loss 0.5796 dev tag acc 98.62% dev head acc 96.77% dev deprel acc 97.59%
70
+ [hops] 2024-09-23 23:09:19.338 | INFO | Epoch 43: train loss 0.0039 dev loss 0.5478 dev tag acc 98.66% dev head acc 96.69% dev deprel acc 97.69%
71
+ [hops] 2024-09-23 23:10:49.453 | INFO | Epoch 44: train loss 0.0034 dev loss 0.5761 dev tag acc 98.66% dev head acc 96.71% dev deprel acc 97.64%
72
+ [hops] 2024-09-23 23:12:20.508 | INFO | Epoch 45: train loss 0.0035 dev loss 0.5968 dev tag acc 98.64% dev head acc 96.75% dev deprel acc 97.63%
73
+ [hops] 2024-09-23 23:13:49.566 | INFO | Epoch 46: train loss 0.0032 dev loss 0.5657 dev tag acc 98.64% dev head acc 96.78% dev deprel acc 97.69%
74
+ [hops] 2024-09-23 23:15:19.300 | INFO | Epoch 47: train loss 0.0029 dev loss 0.6033 dev tag acc 98.67% dev head acc 96.72% dev deprel acc 97.68%
75
+ [hops] 2024-09-23 23:16:46.854 | INFO | Epoch 48: train loss 0.0029 dev loss 0.6110 dev tag acc 98.67% dev head acc 96.72% dev deprel acc 97.68%
76
+ [hops] 2024-09-23 23:18:14.068 | INFO | Epoch 49: train loss 0.0026 dev loss 0.6084 dev tag acc 98.68% dev head acc 96.74% dev deprel acc 97.67%
77
+ [hops] 2024-09-23 23:19:44.122 | INFO | Epoch 50: train loss 0.0025 dev loss 0.6095 dev tag acc 98.62% dev head acc 96.76% dev deprel acc 97.68%
78
+ [hops] 2024-09-23 23:21:10.264 | INFO | Epoch 51: train loss 0.0025 dev loss 0.6551 dev tag acc 98.69% dev head acc 96.71% dev deprel acc 97.73%
79
+ [hops] 2024-09-23 23:22:41.212 | INFO | Epoch 52: train loss 0.0022 dev loss 0.6374 dev tag acc 98.62% dev head acc 96.70% dev deprel acc 97.64%
80
+ [hops] 2024-09-23 23:24:09.182 | INFO | Epoch 53: train loss 0.0021 dev loss 0.6473 dev tag acc 98.64% dev head acc 96.72% dev deprel acc 97.64%
81
+ [hops] 2024-09-23 23:25:37.902 | INFO | Epoch 54: train loss 0.0019 dev loss 0.6793 dev tag acc 98.66% dev head acc 96.73% dev deprel acc 97.67%
82
+ [hops] 2024-09-23 23:27:06.796 | INFO | Epoch 55: train loss 0.0019 dev loss 0.6544 dev tag acc 98.66% dev head acc 96.76% dev deprel acc 97.70%
83
+ [hops] 2024-09-23 23:28:34.813 | INFO | Epoch 56: train loss 0.0016 dev loss 0.7122 dev tag acc 98.66% dev head acc 96.69% dev deprel acc 97.67%
84
+ [hops] 2024-09-23 23:30:01.304 | INFO | Epoch 57: train loss 0.0015 dev loss 0.7413 dev tag acc 98.65% dev head acc 96.69% dev deprel acc 97.68%
85
+ [hops] 2024-09-23 23:31:30.980 | INFO | Epoch 58: train loss 0.0015 dev loss 0.7386 dev tag acc 98.66% dev head acc 96.71% dev deprel acc 97.68%
86
+ [hops] 2024-09-23 23:33:00.851 | INFO | Epoch 59: train loss 0.0014 dev loss 0.7433 dev tag acc 98.65% dev head acc 96.80% dev deprel acc 97.68%
87
+ [hops] 2024-09-23 23:33:00.852 | INFO | New best model: head accuracy 96.80% > 96.79%
88
+ [hops] 2024-09-23 23:34:32.878 | INFO | Epoch 60: train loss 0.0014 dev loss 0.7406 dev tag acc 98.64% dev head acc 96.76% dev deprel acc 97.67%
89
+ [hops] 2024-09-23 23:36:01.771 | INFO | Epoch 61: train loss 0.0014 dev loss 0.7765 dev tag acc 98.63% dev head acc 96.75% dev deprel acc 97.65%
90
+ [hops] 2024-09-23 23:37:32.476 | INFO | Epoch 62: train loss 0.0012 dev loss 0.7706 dev tag acc 98.64% dev head acc 96.73% dev deprel acc 97.67%
91
+ [hops] 2024-09-23 23:38:59.167 | INFO | Epoch 63: train loss 0.0012 dev loss 0.7670 dev tag acc 98.64% dev head acc 96.74% dev deprel acc 97.67%
92
+ [hops] 2024-09-23 23:39:04.307 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
93
+ [hops] 2024-09-23 23:39:12.221 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
94
+ [hops] 2024-09-23 23:39:13.462 | INFO | Metrics for GSD-camembertv2_base_p2_17k_last_layer+rand_seed=42
95
+ ───────────────────────────────
96
+ Split UPOS UAS LAS
97
+ ───────────────────────────────
98
+ Dev 98.65 96.81 95.66
99
+ Test 98.66 95.77 94.32
100
+ ───────────────────────────────
101
+