yhavinga commited on
Commit
043ba5a
1 Parent(s): addf75d
README.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language:
4
+ - nl
5
+ license: apache-2.0
6
+ tags:
7
+ - dutch
8
+ - t5
9
+ - t5x
10
+ - ul2
11
+ - seq2seq
12
+ datasets:
13
+ - yhavinga/mc4_nl_cleaned
14
+ - yhavinga/nedd_wiki_news
15
+ inference: false
16
+ ---
17
+
18
+ # ul2-large-dutch for Dutch
19
+
20
+ Pretrained T5 model on Dutch using a UL2 (Mixture-of-Denoisers) objective.
21
+ The T5 model was introduced in
22
+ [this paper](https://arxiv.org/abs/1910.10683)
23
+ and first released at [this page](https://github.com/google-research/text-to-text-transfer-transformer).
24
+ The UL2 objective was introduced in
25
+ [this paper](https://arxiv.org/abs/2205.05131)
26
+ and first released at [this page](https://github.com/google-research/google-research/tree/master/ul2).
27
+
28
+ **Note:** The Hugging Face inference widget is deactivated because this model needs a text-to-text fine-tuning on
29
+ a specific downstream task to be useful in practice.
30
+
31
+ ## Model description
32
+
33
+ T5 is an encoder-decoder model and treats all NLP problems in a text-to-text format.
34
+ `ul2-large-dutch` T5 is a transformers model pretrained on a very large corpus of
35
+ Dutch data in a self-supervised fashion.
36
+ This means it was pretrained on the raw texts only, with no humans labelling them in any way
37
+ (which is why it can use lots of publicly available data) with an automatic process to generate
38
+ inputs and outputs from those texts.
39
+
40
+
41
+ This model used the [T5 v1.1](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) improvements compared to the original T5 model during the pretraining:
42
+ - GEGLU activation in the feed-forward hidden layer, rather than ReLU - see [here](https://arxiv.org/abs/2002.05202)
43
+ - Dropout was turned off during pre-training. Dropout should be re-enabled during fine-tuning
44
+ - Pre-trained on self-supervised objective only without mixing in the downstream tasks
45
+ - No parameter sharing between embedding and classifier layer
46
+
47
+
48
+
49
+ ### UL2 pretraining objective
50
+
51
+ This model was pretrained with the UL2's Mixture-of-Denoisers (MoD) objective, that combines diverse pre-training
52
+ paradigms together. UL2 frames different objective functions for training language models as denoising tasks, where
53
+ the model has to recover missing sub-sequences of a given input. During pre-training it uses a novel mixture-of-denoisers
54
+ that samples from a varied set of such objectives, each with different configurations. UL2 is trained using a mixture of
55
+ three denoising tasks:
56
+
57
+ 1. R-denoising (or regular span corruption), which emulates the standard T5 span corruption objective;
58
+ 2. X-denoising (or extreme span corruption); and
59
+ 3. S-denoising (or sequential PrefixLM).
60
+
61
+ During pre-training, we sample from the available denoising tasks based on user-specified ratios.
62
+ UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training
63
+ denoising task. During the pre-training, a paradigm token is inserted to the input
64
+ (`[NLU]` for R-denoising, `[NLG]` for X-denoising, or `[S2S]` for S-denoising) indicating the denoising task at hand.
65
+ Then, during fine-tuning the same input token should be inserted to get the best performance for different downstream
66
+ fine-tuning tasks.
67
+
68
+ ## Intended uses & limitations
69
+
70
+ This model was only pretrained in a self-supervised way excluding any supervised training.
71
+ Therefore, this model has to be fine-tuned before it is usable on a downstream task,
72
+ like text classification, unlike the Google's original T5 model.
73
+
74
+ **Note:** You most likely need to fine-tune these T5/UL2 models without mixed precision
75
+ so fine-tune them with full fp32 precision. Fine-tuning with Flax in bf16 - `model.to_bf16()` - is possible
76
+ if you set the mask correctly to exclude layernorm and embedding layers. Also note that the T5x pre-training
77
+ and fine-tuning configs set `z_loss` to 1e-4, which is used to keep the loss scale from underflowing.
78
+ You can also find more fine-tuning tips from [here](https://discuss.huggingface.co/t/t5-finetuning-tips), for example.
79
+
80
+ **Note**: For fine-tuning, most likely you can get better results if you insert a prefix token
81
+ of `[NLU]`, `[NLG]`, or `[S2S]` to your input texts.
82
+ For general language understanding fine-tuning tasks, you could use the `[NLU]` token.
83
+ For GPT-style causal language generation, you could use the `[S2S]` token.
84
+ The token `[NLG]` of the X-denoising pretrain task is somewhat mix between the language understanding and causal language
85
+ generation so the token `[NLG]` could maybe be used for language generation fine-tuning too.
86
+
87
+ ### How to use
88
+
89
+ Here is how to use this model in PyTorch:
90
+
91
+ ```python
92
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
93
+
94
+ tokenizer = T5Tokenizer.from_pretrained("yhavinga/ul2-large-dutch", use_fast=False)
95
+ model = T5ForConditionalGeneration.from_pretrained("yhavinga/ul2-large-dutch")
96
+ ```
97
+
98
+ and in Flax:
99
+
100
+ ```python
101
+ from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
102
+
103
+ tokenizer = T5Tokenizer.from_pretrained("yhavinga/ul2-large-dutch", use_fast=False)
104
+ model = FlaxT5ForConditionalGeneration.from_pretrained("yhavinga/ul2-large-dutch")
105
+ ```
106
+
107
+
108
+ ### Limitations and bias
109
+
110
+ The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral.
111
+ Therefore, the model can have biased predictions. This bias will also affect all fine-tuned versions of this model.
112
+
113
+ ## Training data
114
+
115
+ The `ul2-large-dutch` T5 model was pre-trained simultaneously on a combination of several datasets,
116
+ including the full version of the "mc4_nl_cleaned" dataset, which is a cleaned version of Common Crawl's web
117
+ crawl corpus, Dutch books, the Dutch subset of Wikipedia (2022-03-20), and a subset of "mc4_nl_cleaned"
118
+ containing only texts from Dutch and Belgian newspapers. This last dataset is oversampled to bias the model
119
+ towards descriptions of events in the Netherlands and Belgium.
120
+
121
+
122
+
123
+ ## Training procedure
124
+
125
+ ### Preprocessing
126
+
127
+ The ul2-large-dutch T5 model uses a SentencePiece unigram tokenizer with a vocabulary of 32,000 tokens.
128
+ The tokenizer includes the special tokens `<pad>`, `</s>`, `<unk>`, known from the original T5 paper,
129
+ `[NLU]`, `[NLG]` and `[S2S]` for the MoD pre-training, and `<n>` for newline.
130
+ During pre-training with the UL2 objective, input and output sequences consist of 512 consecutive tokens.
131
+ The tokenizer does not lowercase texts and is therefore case-sensitive; it distinguises
132
+ between `dutch` and `Dutch`.
133
+ Additionally, 100+28 extra tokens were added for pre-training tasks, resulting in a total of 32,128 tokens.
134
+
135
+ ### Pretraining
136
+ The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/),
137
+ for 1000000 steps with a batch size of 64
138
+ (in total 32 B tokens).
139
+ The optimizer used was AdaFactor with learning rate warmup for 10K steps with a constant learning rate of 1e-2,
140
+ and then an inverse square root decay (exponential decay) of the learning rate after.
141
+ The model was trained with Google's Jax/Flax based [t5x framework](https://github.com/google-research/t5x) with help
142
+ from [Stephenn Fernandes](https://huggingface.co/StephennFernandes) to get started writing task definitions that wrap
143
+ HF datasets.
144
+
145
+ The UL2 training objective code used with the [t5x framework](https://github.com/google-research/t5x) was copied and
146
+ slightly modified from the [UL2 paper](https://arxiv.org/pdf/2205.05131.pdf) appendix chapter 9.2 by the authors
147
+ of the Finnish ul2 models. Used UL2 objective code is available in the repository
148
+ [Finnish-NLP/ul2-base-nl36-finnish](https://huggingface.co/Finnish-NLP/ul2-base-nl36-finnish) in the files `ul2_objective.py` and `tasks.py`.
149
+ UL2's mixture-of-denoisers configuration was otherwise equal to the UL2 paper
150
+ but for the rate of mixing denoisers, 20% for S-denoising was used (suggested at the paper chapter 4.5)
151
+ and the rest was divided equally between the R-denoising and X-denoising (i.e. 40% for both).
152
+ ### Model list
153
+
154
+ Models in this series:
155
+ | | ul2-base-dutch | ul2-base-nl36-dutch | ul2-large-dutch | ul2-small-dutch |
156
+ |:---------------------|:---------------------|:----------------------|:---------------------|:---------------------|
157
+ | model_type | t5 | t5 | t5 | t5 |
158
+ | _pipeline_tag | text2text-generation | text2text-generation | text2text-generation | text2text-generation |
159
+ | d_model | 768 | 768 | 1024 | 512 |
160
+ | d_ff | 2048 | 3072 | 2816 | 1024 |
161
+ | num_heads | 12 | 12 | 16 | 6 |
162
+ | d_kv | 64 | 64 | 64 | 64 |
163
+ | num_layers | 12 | 36 | 24 | 8 |
164
+ | num_decoder_layers | 12 | 36 | 24 | 8 |
165
+ | feed_forward_proj | gated-gelu | gated-gelu | gated-gelu | gated-gelu |
166
+ | dense_act_fn | gelu_new | gelu_new | gelu_new | gelu_new |
167
+ | vocab_size | 32128 | 32128 | 32128 | 32128 |
168
+ | tie_word_embeddings | 0 | 0 | 0 | 0 |
169
+ | torch_dtype | float32 | float32 | float32 | float32 |
170
+ | _gin_batch_size | 128 | 64 | 64 | 128 |
171
+ | _gin_z_loss | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
172
+ | _gin_t5_config_dtype | 'bfloat16' | 'bfloat16' | 'bfloat16' | 'bfloat16' |
173
+
174
+
175
+
176
+ ## Evaluation results
177
+
178
+ See the evaluation section in the interactive [Pre-training Dutch T5 Models](https://huggingface.co/spaces/yhavinga/pre-training-dutch-t5-models) blog.
179
+
180
+ ## Acknowledgements
181
+
182
+ This project would not have been possible without compute generously provided by Google through the
183
+ [TPU Research Cloud](https://sites.research.google/trc/).
184
+ Thanks to the [Finnish-NLP](https://huggingface.co/Finnish-NLP) authors for releasing their code for the UL2 objective and associated task definitions.
185
+ Thanks to [Stephenn Fernandes](https://huggingface.co/StephennFernandes) for helping me get started with the t5x framework.
186
+
187
+ Created by [Yeb Havinga](https://www.linkedin.com/in/yeb-havinga-86530825/)
188
+
added_tokens.json ADDED
@@ -0,0 +1 @@
 
1
+ {"[new_id_17]": 32117, "[new_id_20]": 32120, "[new_id_13]": 32113, "[new_id_2]": 32102, "[new_id_16]": 32116, "[new_id_7]": 32107, "[new_id_5]": 32105, "[new_id_1]": 32101, "[new_id_15]": 32115, "[new_id_12]": 32112, "[new_id_0]": 32100, "[new_id_11]": 32111, "[new_id_25]": 32125, "[new_id_24]": 32124, "[new_id_10]": 32110, "[new_id_27]": 32127, "[new_id_23]": 32123, "[new_id_14]": 32114, "[new_id_22]": 32122, "[new_id_21]": 32121, "[new_id_19]": 32119, "[new_id_3]": 32103, "[new_id_4]": 32104, "[new_id_18]": 32118, "[new_id_9]": 32109, "[new_id_8]": 32108, "[new_id_26]": 32126, "[new_id_6]": 32106}
config.gin ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __gin__ import dynamic_registration
2
+ import __main__ as train_script
3
+ import seqio
4
+ import t5.data.mixtures
5
+ from t5x import adafactor
6
+ from t5x.examples.t5 import network
7
+ from t5x import gin_utils
8
+ from t5x import models
9
+ from t5x import partitioning
10
+ from t5x import trainer
11
+ from t5x import utils
12
+ import tasks.nedd_tasks
13
+ import tasks.ul2_tasks as tasks2
14
+
15
+ # Macros:
16
+ # ==============================================================================
17
+ BATCH_SIZE = 64
18
+ DROPOUT_RATE = 0.0
19
+ LABEL_SMOOTHING = 0.0
20
+ LOSS_NORMALIZING_FACTOR = None
21
+ MIXTURE_OR_TASK_MODULE = None
22
+ MIXTURE_OR_TASK_NAME = 'ul2_mc4_nedd_wiki_news_mix_1'
23
+ MODEL = @models.EncoderDecoderModel()
24
+ MODEL_DIR = 'ul2_large_mc4_nedd_wiki_news_nl'
25
+ OPTIMIZER = @adafactor.Adafactor()
26
+ RANDOM_SEED = None
27
+ SHUFFLE_TRAIN_EXAMPLES = True
28
+ TASK_FEATURE_LENGTHS = {'inputs': 512, 'targets': 512}
29
+ TRAIN_STEPS = 1000000
30
+ USE_CACHED_TASKS = False
31
+ USE_HARDWARE_RNG = False
32
+ VOCABULARY = @seqio.SentencePieceVocabulary()
33
+ Z_LOSS = 0.0001
34
+
35
+ # Parameters for adafactor.Adafactor:
36
+ # ==============================================================================
37
+ adafactor.Adafactor.decay_rate = 0.8
38
+ adafactor.Adafactor.logical_factor_rules = \
39
+ @adafactor.standard_logical_factor_rules()
40
+ adafactor.Adafactor.step_offset = 0
41
+
42
+ # Parameters for utils.CheckpointConfig:
43
+ # ==============================================================================
44
+ utils.CheckpointConfig.restore = @utils.RestoreCheckpointConfig()
45
+ utils.CheckpointConfig.save = @utils.SaveCheckpointConfig()
46
+
47
+ # Parameters for utils.create_learning_rate_scheduler:
48
+ # ==============================================================================
49
+ utils.create_learning_rate_scheduler.base_learning_rate = 1.0
50
+ utils.create_learning_rate_scheduler.factors = 'constant * rsqrt_decay'
51
+ utils.create_learning_rate_scheduler.warmup_steps = 10000
52
+
53
+ # Parameters for train/utils.DatasetConfig:
54
+ # ==============================================================================
55
+ train/utils.DatasetConfig.batch_size = %BATCH_SIZE
56
+ train/utils.DatasetConfig.mixture_or_task_name = %MIXTURE_OR_TASK_NAME
57
+ train/utils.DatasetConfig.module = %MIXTURE_OR_TASK_MODULE
58
+ train/utils.DatasetConfig.pack = True
59
+ train/utils.DatasetConfig.seed = None
60
+ train/utils.DatasetConfig.shuffle = %SHUFFLE_TRAIN_EXAMPLES
61
+ train/utils.DatasetConfig.split = 'train'
62
+ train/utils.DatasetConfig.task_feature_lengths = %TASK_FEATURE_LENGTHS
63
+ train/utils.DatasetConfig.use_cached = %USE_CACHED_TASKS
64
+
65
+ # Parameters for train_eval/utils.DatasetConfig:
66
+ # ==============================================================================
67
+ train_eval/utils.DatasetConfig.batch_size = %BATCH_SIZE
68
+ train_eval/utils.DatasetConfig.mixture_or_task_name = %MIXTURE_OR_TASK_NAME
69
+ train_eval/utils.DatasetConfig.module = %MIXTURE_OR_TASK_MODULE
70
+ train_eval/utils.DatasetConfig.pack = True
71
+ train_eval/utils.DatasetConfig.seed = 42
72
+ train_eval/utils.DatasetConfig.shuffle = False
73
+ train_eval/utils.DatasetConfig.split = 'validation'
74
+ train_eval/utils.DatasetConfig.task_feature_lengths = %TASK_FEATURE_LENGTHS
75
+ train_eval/utils.DatasetConfig.use_cached = %USE_CACHED_TASKS
76
+
77
+ # Parameters for models.EncoderDecoderModel:
78
+ # ==============================================================================
79
+ models.EncoderDecoderModel.input_vocabulary = %VOCABULARY
80
+ models.EncoderDecoderModel.label_smoothing = %LABEL_SMOOTHING
81
+ models.EncoderDecoderModel.loss_normalizing_factor = %LOSS_NORMALIZING_FACTOR
82
+ models.EncoderDecoderModel.module = @network.Transformer()
83
+ models.EncoderDecoderModel.optimizer_def = %OPTIMIZER
84
+ models.EncoderDecoderModel.output_vocabulary = %VOCABULARY
85
+ models.EncoderDecoderModel.z_loss = %Z_LOSS
86
+
87
+ # Parameters for partitioning.PjitPartitioner:
88
+ # ==============================================================================
89
+ partitioning.PjitPartitioner.logical_axis_rules = \
90
+ @partitioning.standard_logical_axis_rules()
91
+ partitioning.PjitPartitioner.model_parallel_submesh = None
92
+ partitioning.PjitPartitioner.num_partitions = 1
93
+
94
+ # Parameters for utils.RestoreCheckpointConfig:
95
+ # ==============================================================================
96
+ utils.RestoreCheckpointConfig.path = []
97
+
98
+ # Parameters for utils.SaveCheckpointConfig:
99
+ # ==============================================================================
100
+ utils.SaveCheckpointConfig.dtype = 'float32'
101
+ utils.SaveCheckpointConfig.keep = 4
102
+ utils.SaveCheckpointConfig.period = 50000
103
+ utils.SaveCheckpointConfig.save_dataset = False
104
+ utils.SaveCheckpointConfig.use_gda = False
105
+
106
+ # Parameters for seqio.SentencePieceVocabulary:
107
+ # ==============================================================================
108
+ seqio.SentencePieceVocabulary.sentencepiece_model_file = \
109
+ 'gs://t5-dutch-english/vocabs/nedd.32000.128extra/spiece.model'
110
+
111
+ # Parameters for network.T5Config:
112
+ # ==============================================================================
113
+ network.T5Config.dropout_rate = %DROPOUT_RATE
114
+ network.T5Config.dtype = 'bfloat16'
115
+ network.T5Config.emb_dim = 1024
116
+ network.T5Config.head_dim = 64
117
+ network.T5Config.logits_via_embedding = False
118
+ network.T5Config.mlp_activations = ('gelu', 'linear')
119
+ network.T5Config.mlp_dim = 2816
120
+ network.T5Config.num_decoder_layers = 24
121
+ network.T5Config.num_encoder_layers = 24
122
+ network.T5Config.num_heads = 16
123
+ network.T5Config.vocab_size = 32128
124
+
125
+ # Parameters for train_script.train:
126
+ # ==============================================================================
127
+ train_script.train.checkpoint_cfg = @utils.CheckpointConfig()
128
+ train_script.train.eval_period = 2000
129
+ train_script.train.eval_steps = 20
130
+ train_script.train.infer_eval_dataset_cfg = None
131
+ train_script.train.model = %MODEL
132
+ train_script.train.model_dir = %MODEL_DIR
133
+ train_script.train.partitioner = @partitioning.PjitPartitioner()
134
+ train_script.train.random_seed = %RANDOM_SEED
135
+ train_script.train.stats_period = 100
136
+ train_script.train.summarize_config_fn = @gin_utils.summarize_gin_config
137
+ train_script.train.total_steps = %TRAIN_STEPS
138
+ train_script.train.train_dataset_cfg = @train/utils.DatasetConfig()
139
+ train_script.train.train_eval_dataset_cfg = @train_eval/utils.DatasetConfig()
140
+ train_script.train.trainer_cls = @trainer.Trainer
141
+ train_script.train.use_hardware_rng = %USE_HARDWARE_RNG
142
+
143
+ # Parameters for trainer.Trainer:
144
+ # ==============================================================================
145
+ trainer.Trainer.learning_rate_fn = @utils.create_learning_rate_scheduler()
146
+ trainer.Trainer.num_microbatches = None
147
+
148
+ # Parameters for network.Transformer:
149
+ # ==============================================================================
150
+ network.Transformer.config = @network.T5Config()
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./",
3
+ "architectures": [
4
+ "T5ForConditionalGeneration"
5
+ ],
6
+ "d_ff": 2816,
7
+ "d_kv": 64,
8
+ "d_model": 1024,
9
+ "decoder_start_token_id": 0,
10
+ "dense_act_fn": "gelu_new",
11
+ "dropout_rate": 0.1,
12
+ "eos_token_id": 1,
13
+ "feed_forward_proj": "gated-gelu",
14
+ "initializer_factor": 1.0,
15
+ "is_encoder_decoder": true,
16
+ "is_gated_act": true,
17
+ "layer_norm_epsilon": 1e-06,
18
+ "model_type": "t5",
19
+ "n_positions": 512,
20
+ "num_decoder_layers": 24,
21
+ "num_heads": 16,
22
+ "num_layers": 24,
23
+ "output_past": true,
24
+ "pad_token_id": 0,
25
+ "relative_attention_max_distance": 128,
26
+ "relative_attention_num_buckets": 32,
27
+ "tie_word_embeddings": false,
28
+ "torch_dtype": "float32",
29
+ "transformers_version": "4.24.0",
30
+ "use_cache": true,
31
+ "vocab_size": 32128
32
+ }
flax_model.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:814634a5ae7885fd1972d69bd742f137db4217bc82bd2ae2e685cea31b17d45e
3
+ size 3132624407
model-info.txt ADDED
The diff for this file is too large to render. See raw diff
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:802b9f1157b55997a0464d3526bf44f62d560ab1b4777b0cf1475ffc3e715a87
3
+ size 3132785797
special_tokens_map.json ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "eos_token": "</s>",
105
+ "pad_token": "<pad>",
106
+ "unk_token": "<unk>"
107
+ }
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:caa6e2f21aeec181276ab80273e3f869ce303ccb8602d68e0524783c3581092d
3
+ size 800223
spiece.vocab ADDED
The diff for this file is too large to render. See raw diff
tokenizer_config.json ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "eos_token": "</s>",
105
+ "extra_ids": 100,
106
+ "name_or_path": "yhavinga/ul2-large-dutch",
107
+ "pad_token": "<pad>",
108
+ "sp_model_kwargs": {},
109
+ "special_tokens_map_file": null,
110
+ "tokenizer_class": "T5Tokenizer",
111
+ "unk_token": "<unk>",
112
+ "use_fast_tokenizer": false
113
+ }
train/events.out.tfevents.1670184055.t1v-n-a765f9c4-w-0.454498.0.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37e948013d1995b859e95efffd4f9802191478fe3a90de95888844b66ea33aca
3
+ size 8900201
train/events.out.tfevents.1670537273.t1v-n-a765f9c4-w-0.181290.0.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f24fb9a95bc699d38810149c94f1693bfe5581671985e211a4e37cdeac9bd23
3
+ size 11923209
training_eval/mc4_nl_ul2_denoising/events.out.tfevents.1670184055.t1v-n-a765f9c4-w-0.454498.1.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c90b471aab9a8ee65941e5534d9f2af329350cdd429e671f4325454edaa261ea
3
+ size 393973
training_eval/mc4_nl_ul2_denoising/events.out.tfevents.1670537273.t1v-n-a765f9c4-w-0.181290.1.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d5fec75f40e46c3f7a979dfcc32f2ab4bba1473915ff39d25d3d7d6be906646
3
+ size 527826
training_eval/ul2_mc4_nedd_wiki_news_mix_1/events.out.tfevents.1670184055.t1v-n-a765f9c4-w-0.454498.2.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0885b6ef7619ec4445c2da900962a94fea408376b75ee5bb48480fae448a625
3
+ size 393973
training_eval/ul2_mc4_nedd_wiki_news_mix_1/events.out.tfevents.1670537273.t1v-n-a765f9c4-w-0.181290.2.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:233df9a446688295c00aefaae6984514b4385aedb93a3d80ecc692b7b70dc5ba
3
+ size 527826