ts_models / workspace /flan-ts-large.txt
Olsatthe's picture
Upload 8 files
6f07f42
CUDA extension not installed.
Downloading (��)lve/main/config.json: 100%|����������| 662/662 [00:00<00:00, 1.65MB/s]
Downloading pytorch_model.bin: 100%|��������������| 3.13G/3.13G [00:36<00:00, 86.9MB/s]
Some weights of the model checkpoint at google/flan-t5-large were not used when initializing T5EncoderModel: ['decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.20.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight', 'decoder.block.13.layer.0.SelfAttention.k.weight', 'decoder.block.20.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'decoder.embed_tokens.weight', 'decoder.block.23.layer.0.SelfAttention.k.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.18.layer.1.EncDecAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.12.layer.0.layer_norm.weight', 'decoder.block.23.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.q.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.14.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.15.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.12.layer.1.EncDecAttention.o.weight', 'decoder.block.23.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.13.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.1.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.v.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.16.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.20.layer.1.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.13.layer.0.SelfAttention.v.weight', 'decoder.block.21.layer.2.layer_norm.weight', 'decoder.block.20.layer.1.EncDecAttention.o.weight', 'decoder.block.16.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.0.SelfAttention.v.weight', 'decoder.block.17.layer.2.DenseReluDense.wo.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.22.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.block.13.layer.0.SelfAttention.q.weight', 'decoder.block.17.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.0.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.1.EncDecAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.17.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.19.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.18.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.22.layer.0.SelfAttention.v.weight', 'decoder.block.15.layer.1.layer_norm.weight', 'decoder.block.20.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.18.layer.1.EncDecAttention.q.weight', 'decoder.block.23.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.18.layer.1.EncDecAttention.v.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.21.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.0.layer_norm.weight', 'decoder.block.22.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.0.layer_norm.weight', 'decoder.block.23.layer.1.EncDecAttention.o.weight', 'decoder.block.17.layer.0.SelfAttention.q.weight', 'decoder.block.22.layer.1.EncDecAttention.k.weight', 'decoder.block.4.layer.0.layer_norm.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.1.EncDecAttention.k.weight', 'decoder.block.13.layer.1.EncDecAttention.k.weight', 'decoder.block.19.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.12.layer.1.layer_norm.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.20.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.2.layer_norm.weight', 'decoder.block.17.layer.1.EncDecAttention.q.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.2.DenseReluDense.wo.weight', 'decoder.block.21.layer.0.SelfAttention.k.weight', 'decoder.block.23.layer.0.SelfAttention.v.weight', 'decoder.block.20.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.k.weight', 'decoder.block.14.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.q.weight', 'decoder.block.21.layer.1.EncDecAttention.q.weight', 'decoder.block.21.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.16.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.15.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.k.weight', 'decoder.block.17.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.19.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.1.EncDecAttention.k.weight', 'decoder.block.16.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.1.EncDecAttention.q.weight', 'lm_head.weight', 'decoder.block.17.layer.2.layer_norm.weight', 'decoder.block.18.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.17.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.1.EncDecAttention.k.weight', 'decoder.block.21.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.layer_norm.weight', 'decoder.block.21.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.v.weight', 'decoder.block.22.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'decoder.block.4.layer.0.SelfAttention.o.weight', 'decoder.block.5.layer.1.layer_norm.weight', 'decoder.block.23.layer.2.layer_norm.weight', 'decoder.block.17.layer.0.layer_norm.weight', 'decoder.block.14.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.2.layer_norm.weight', 'decoder.block.5.layer.2.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.14.layer.2.DenseReluDense.wo.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.1.layer_norm.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.5.layer.0.SelfAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.k.weight', 'decoder.block.19.layer.1.EncDecAttention.k.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.22.layer.1.EncDecAttention.v.weight', 'decoder.block.5.layer.0.SelfAttention.o.weight', 'decoder.block.14.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.16.layer.2.layer_norm.weight', 'decoder.block.4.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.15.layer.0.layer_norm.weight', 'decoder.block.16.layer.0.SelfAttention.k.weight', 'decoder.block.23.layer.1.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.13.layer.1.EncDecAttention.v.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.20.layer.0.layer_norm.weight', 'decoder.block.13.layer.2.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.21.layer.0.SelfAttention.o.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.2.DenseReluDense.wo.weight', 'decoder.block.12.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.12.layer.1.EncDecAttention.v.weight', 'decoder.block.22.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.SelfAttention.q.weight', 'decoder.block.16.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.17.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.18.layer.0.SelfAttention.o.weight', 'decoder.block.19.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.layer_norm.weight', 'decoder.block.12.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'decoder.block.12.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.0.SelfAttention.q.weight', 'decoder.block.4.layer.2.layer_norm.weight', 'decoder.block.19.layer.2.DenseReluDense.wo.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.22.layer.2.DenseReluDense.wo.weight', 'decoder.block.14.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'decoder.block.4.layer.0.SelfAttention.q.weight', 'decoder.block.19.layer.0.layer_norm.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.23.layer.1.EncDecAttention.k.weight', 'decoder.block.20.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.16.layer.0.layer_norm.weight', 'decoder.block.18.layer.0.SelfAttention.v.weight', 'decoder.block.12.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.13.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.15.layer.2.DenseReluDense.wo.weight', 'decoder.block.4.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.13.layer.1.EncDecAttention.q.weight', 'decoder.block.5.layer.1.EncDecAttention.v.weight', 'decoder.block.17.layer.1.layer_norm.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.11.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.15.layer.1.EncDecAttention.o.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'decoder.block.13.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.14.layer.1.layer_norm.weight', 'decoder.block.19.layer.0.SelfAttention.o.weight', 'decoder.block.13.layer.0.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.22.layer.1.layer_norm.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.19.layer.1.layer_norm.weight', 'decoder.block.21.layer.1.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.16.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.18.layer.1.layer_norm.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.18.layer.2.layer_norm.weight', 'decoder.block.16.layer.0.SelfAttention.o.weight', 'decoder.block.12.layer.2.DenseReluDense.wo.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'decoder.block.18.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'decoder.block.20.layer.1.EncDecAttention.q.weight', 'decoder.block.4.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.15.layer.0.SelfAttention.v.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.15.layer.2.layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.14.layer.0.SelfAttention.k.weight', 'decoder.block.22.layer.0.SelfAttention.o.weight', 'decoder.block.20.layer.2.layer_norm.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.final_layer_norm.weight', 'decoder.block.15.layer.1.EncDecAttention.q.weight', 'decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.12.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.23.layer.1.EncDecAttention.q.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight']
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126)
Downloading (��)okenizer_config.json: 100%|��| 2.54k/2.54k [00:00<00:00, 9.09MB/s]
Downloading spiece.model: 100%|����������������������������| 792k/792k [00:00<00:00, 28.7MB/s]
Downloading (��)cial_tokens_map.json: 100%|��| 2.20k/2.20k [00:00<00:00, 7.83MB/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (2837981 > 512). Running this sequence through the model will result in indexing errors
Starting ...
Ready.
0 layer.0.SelfAttention.q
Quantizing ...
time 0.55
error 142.37025451660156
0 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 9521.5029296875
0 layer.0.SelfAttention.v
Quantizing ...
time 0.26
error 2544.900390625
0 layer.0.SelfAttention.o
Quantizing ...
time 0.28
error 123186.2578125
0 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 11158.978515625
0 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 9518.11328125
0 layer.1.DenseReluDense.wo
Quantizing ...
time 0.72
error 3637286.0
1 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 536.7674560546875
1 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 25588.546875
1 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 1919.272216796875
1 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 47080.5625
1 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 9808.359375
1 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 6298.18896484375
1 layer.1.DenseReluDense.wo
Quantizing ...
time 0.71
error 137391.875
2 layer.0.SelfAttention.q
Quantizing ...
time 0.41
error 125.06156921386719
2 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 6493.82568359375
2 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 1306.6259765625
2 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 3543.05029296875
2 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 10326.599609375
2 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 8165.3193359375
2 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 105276.7265625
3 layer.0.SelfAttention.q
Quantizing ...
time 0.41
error 137.07083129882812
3 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 7485.19384765625
3 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 1563.48095703125
3 layer.0.SelfAttention.o
Quantizing ...
time 0.27
error 3057.40673828125
3 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 10634.482421875
3 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.27
error 9444.2841796875
3 layer.1.DenseReluDense.wo
Quantizing ...
time 0.73
error 105683.125
4 layer.0.SelfAttention.q
Quantizing ...
time 0.41
error 133.7151336669922
4 layer.0.SelfAttention.k
Quantizing ...
time 0.27
error 7297.93896484375
4 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 1610.62939453125
4 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 7214.41796875
4 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 14451.642578125
4 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 15960.328125
4 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 4980679168.0
5 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 140.4214324951172
5 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 7479.8193359375
5 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 2484.518310546875
5 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 8618.46484375
5 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.27
error 10754.0419921875
5 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 13012.9423828125
5 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 107111.1875
6 layer.0.SelfAttention.q
Quantizing ...
time 0.40
error 112.6629867553711
6 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 7047.806640625
6 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 2059.9892578125
6 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 5445.0029296875
6 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.26
error 11107.181640625
6 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 15983.3603515625
6 layer.1.DenseReluDense.wo
Quantizing ...
time 0.70
error 685753216.0
7 layer.0.SelfAttention.q
Quantizing ...
time 0.41
error 133.351806640625
7 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 8262.615234375
7 layer.0.SelfAttention.v
Quantizing ...
time 0.26
error 2878.16943359375
7 layer.0.SelfAttention.o
Quantizing ...
time 0.27
error 17972.373046875
7 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 11895.857421875
7 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 18337.82421875
7 layer.1.DenseReluDense.wo
Quantizing ...
time 0.72
error 25902379008.0
8 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 120.18170928955078
8 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 7699.7255859375
8 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 2972.5712890625
8 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 8750.123046875
8 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 11126.8662109375
8 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 18306.9609375
8 layer.1.DenseReluDense.wo
Quantizing ...
time 0.71
error 128990.28125
9 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 126.16083526611328
9 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 8584.9208984375
9 layer.0.SelfAttention.v
Quantizing ...
time 0.26
error 3245.54541015625
9 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 15868.41015625
9 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 9290.447265625
9 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 17894.17578125
9 layer.1.DenseReluDense.wo
Quantizing ...
time 0.71
error 149863.296875
10 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 107.48172760009766
10 layer.0.SelfAttention.k
Quantizing ...
time 0.27
error 6898.35595703125
10 layer.0.SelfAttention.v
Quantizing ...
time 0.26
error 3770.64990234375
10 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 17137.037109375
10 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.27
error 8128.5166015625
10 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.26
error 17371.587890625
10 layer.1.DenseReluDense.wo
Quantizing ...
time 0.73
error 116027.1015625
11 layer.0.SelfAttention.q
Quantizing ...
time 0.40
error 104.61625671386719
11 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 7259.4208984375
11 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 5005.52490234375
11 layer.0.SelfAttention.o
Quantizing ...
time 0.27
error 32728.1015625
11 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 8535.056640625
11 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.27
error 22538.978515625
11 layer.1.DenseReluDense.wo
Quantizing ...
time 0.71
error 170254.40625
12 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 94.82140350341797
12 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 6448.5205078125
12 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 5083.41796875
12 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 60036.953125
12 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.26
error 7829.4384765625
12 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.26
error 23411.65234375
12 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 231657.15625
13 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 90.77069091796875
13 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 5828.037109375
13 layer.0.SelfAttention.v
Quantizing ...
time 0.26
error 4888.35302734375
13 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 41515.46484375
13 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 7063.1728515625
13 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 23648.7421875
13 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 261193.75
14 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 77.24964904785156
14 layer.0.SelfAttention.k
Quantizing ...
time 0.27
error 5096.2626953125
14 layer.0.SelfAttention.v
Quantizing ...
time 0.26
error 6915.9384765625
14 layer.0.SelfAttention.o
Quantizing ...
time 0.26
error 56402.62890625
14 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.28
error 6039.11328125
14 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 24090.625
14 layer.1.DenseReluDense.wo
Quantizing ...
time 0.71
error 355204.3125
15 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 72.92942810058594
15 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 5561.1201171875
15 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 8621.376953125
15 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 146386.5625
15 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 5684.064453125
15 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 26869.12109375
15 layer.1.DenseReluDense.wo
Quantizing ...
time 0.70
error 361036.25
16 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 75.83228302001953
16 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 5176.50341796875
16 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 9754.8203125
16 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 231755.03125
16 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.27
error 5699.75390625
16 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 25039.771484375
16 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 651520.75
17 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 61.858299255371094
17 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 4369.08251953125
17 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 12425.16796875
17 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 408129.875
17 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 5317.8798828125
17 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 26979.31640625
17 layer.1.DenseReluDense.wo
Quantizing ...
time 0.73
error 689154.875
18 layer.0.SelfAttention.q
Quantizing ...
time 0.41
error 68.12550354003906
18 layer.0.SelfAttention.k
Quantizing ...
time 0.27
error 4010.4833984375
18 layer.0.SelfAttention.v
Quantizing ...
time 0.26
error 14657.2314453125
18 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 206627.5
18 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.28
error 6068.525390625
18 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 28093.669921875
18 layer.1.DenseReluDense.wo
Quantizing ...
time 0.72
error 1019951.8125
19 layer.0.SelfAttention.q
Quantizing ...
time 0.41
error 57.68662643432617
19 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 4086.83349609375
19 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 14453.2578125
19 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 460674.0
19 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 5235.9794921875
19 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.26
error 28788.4765625
19 layer.1.DenseReluDense.wo
Quantizing ...
time 0.70
error 1332541.0
20 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 42.9056510925293
20 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 2894.2177734375
20 layer.0.SelfAttention.v
Quantizing ...
time 0.25
error 16684.044921875
20 layer.0.SelfAttention.o
Quantizing ...
time 0.25
error 557086.6875
20 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 6791.15625
20 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 38994.37890625
20 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 2295082.0
21 layer.0.SelfAttention.q
Quantizing ...
time 0.41
error 58.024559020996094
21 layer.0.SelfAttention.k
Quantizing ...
time 0.25
error 3534.38427734375
21 layer.0.SelfAttention.v
Quantizing ...
time 0.28
error 23622.609375
21 layer.0.SelfAttention.o
Quantizing ...
time 0.26
error 630538.75
21 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.27
error 6944.4306640625
21 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 41437.5546875
21 layer.1.DenseReluDense.wo
Quantizing ...
time 0.72
error 2805766.25
22 layer.0.SelfAttention.q
Quantizing ...
time 0.39
error 56.98418426513672
22 layer.0.SelfAttention.k
Quantizing ...
time 0.27
error 2588.40576171875
22 layer.0.SelfAttention.v
Quantizing ...
time 0.26
error 33727.3125
22 layer.0.SelfAttention.o
Quantizing ...
time 0.26
error 1536184.5
22 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.28
error 7638.18701171875
22 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 49872.0859375
22 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 4077312.5
23 layer.0.SelfAttention.q
Quantizing ...
time 0.40
error 53.174556732177734
23 layer.0.SelfAttention.k
Quantizing ...
time 0.26
error 2663.560302734375
23 layer.0.SelfAttention.v
Quantizing ...
time 0.27
error 35553.75
23 layer.0.SelfAttention.o
Quantizing ...
time 0.26
error 1983365.75
23 layer.1.DenseReluDense.wi_0
Quantizing ...
time 0.25
error 8208.654296875
23 layer.1.DenseReluDense.wi_1
Quantizing ...
time 0.25
error 51633.640625
23 layer.1.DenseReluDense.wo
Quantizing ...
time 0.69
error 8843078.0
114.8298749923706
Packing ...
encoder.block.0.layer.0.SelfAttention.q
encoder.block.0.layer.0.SelfAttention.k
encoder.block.0.layer.0.SelfAttention.v
encoder.block.0.layer.0.SelfAttention.o
encoder.block.0.layer.1.DenseReluDense.wi_0
encoder.block.0.layer.1.DenseReluDense.wi_1
encoder.block.0.layer.1.DenseReluDense.wo
encoder.block.1.layer.0.SelfAttention.q
encoder.block.1.layer.0.SelfAttention.k
encoder.block.1.layer.0.SelfAttention.v
encoder.block.1.layer.0.SelfAttention.o
encoder.block.1.layer.1.DenseReluDense.wi_0
encoder.block.1.layer.1.DenseReluDense.wi_1
encoder.block.1.layer.1.DenseReluDense.wo
encoder.block.2.layer.0.SelfAttention.q
encoder.block.2.layer.0.SelfAttention.k
encoder.block.2.layer.0.SelfAttention.v
encoder.block.2.layer.0.SelfAttention.o
encoder.block.2.layer.1.DenseReluDense.wi_0
encoder.block.2.layer.1.DenseReluDense.wi_1
encoder.block.2.layer.1.DenseReluDense.wo
encoder.block.3.layer.0.SelfAttention.q
encoder.block.3.layer.0.SelfAttention.k
encoder.block.3.layer.0.SelfAttention.v
encoder.block.3.layer.0.SelfAttention.o
encoder.block.3.layer.1.DenseReluDense.wi_0
encoder.block.3.layer.1.DenseReluDense.wi_1
encoder.block.3.layer.1.DenseReluDense.wo
encoder.block.4.layer.0.SelfAttention.q
encoder.block.4.layer.0.SelfAttention.k
encoder.block.4.layer.0.SelfAttention.v
encoder.block.4.layer.0.SelfAttention.o
encoder.block.4.layer.1.DenseReluDense.wi_0
encoder.block.4.layer.1.DenseReluDense.wi_1
encoder.block.4.layer.1.DenseReluDense.wo
encoder.block.5.layer.0.SelfAttention.q
encoder.block.5.layer.0.SelfAttention.k
encoder.block.5.layer.0.SelfAttention.v
encoder.block.5.layer.0.SelfAttention.o
encoder.block.5.layer.1.DenseReluDense.wi_0
encoder.block.5.layer.1.DenseReluDense.wi_1
encoder.block.5.layer.1.DenseReluDense.wo
encoder.block.6.layer.0.SelfAttention.q
encoder.block.6.layer.0.SelfAttention.k
encoder.block.6.layer.0.SelfAttention.v
encoder.block.6.layer.0.SelfAttention.o
encoder.block.6.layer.1.DenseReluDense.wi_0
encoder.block.6.layer.1.DenseReluDense.wi_1
encoder.block.6.layer.1.DenseReluDense.wo
encoder.block.7.layer.0.SelfAttention.q
encoder.block.7.layer.0.SelfAttention.k
encoder.block.7.layer.0.SelfAttention.v
encoder.block.7.layer.0.SelfAttention.o
encoder.block.7.layer.1.DenseReluDense.wi_0
encoder.block.7.layer.1.DenseReluDense.wi_1
encoder.block.7.layer.1.DenseReluDense.wo
encoder.block.8.layer.0.SelfAttention.q
encoder.block.8.layer.0.SelfAttention.k
encoder.block.8.layer.0.SelfAttention.v
encoder.block.8.layer.0.SelfAttention.o
encoder.block.8.layer.1.DenseReluDense.wi_0
encoder.block.8.layer.1.DenseReluDense.wi_1
encoder.block.8.layer.1.DenseReluDense.wo
encoder.block.9.layer.0.SelfAttention.q
encoder.block.9.layer.0.SelfAttention.k
encoder.block.9.layer.0.SelfAttention.v
encoder.block.9.layer.0.SelfAttention.o
encoder.block.9.layer.1.DenseReluDense.wi_0
encoder.block.9.layer.1.DenseReluDense.wi_1
encoder.block.9.layer.1.DenseReluDense.wo
encoder.block.10.layer.0.SelfAttention.q
encoder.block.10.layer.0.SelfAttention.k
encoder.block.10.layer.0.SelfAttention.v
encoder.block.10.layer.0.SelfAttention.o
encoder.block.10.layer.1.DenseReluDense.wi_0
encoder.block.10.layer.1.DenseReluDense.wi_1
encoder.block.10.layer.1.DenseReluDense.wo
encoder.block.11.layer.0.SelfAttention.q
encoder.block.11.layer.0.SelfAttention.k
encoder.block.11.layer.0.SelfAttention.v
encoder.block.11.layer.0.SelfAttention.o
encoder.block.11.layer.1.DenseReluDense.wi_0
encoder.block.11.layer.1.DenseReluDense.wi_1
encoder.block.11.layer.1.DenseReluDense.wo
encoder.block.12.layer.0.SelfAttention.q
encoder.block.12.layer.0.SelfAttention.k
encoder.block.12.layer.0.SelfAttention.v
encoder.block.12.layer.0.SelfAttention.o
encoder.block.12.layer.1.DenseReluDense.wi_0
encoder.block.12.layer.1.DenseReluDense.wi_1
encoder.block.12.layer.1.DenseReluDense.wo
encoder.block.13.layer.0.SelfAttention.q
encoder.block.13.layer.0.SelfAttention.k
encoder.block.13.layer.0.SelfAttention.v
encoder.block.13.layer.0.SelfAttention.o
encoder.block.13.layer.1.DenseReluDense.wi_0
encoder.block.13.layer.1.DenseReluDense.wi_1
encoder.block.13.layer.1.DenseReluDense.wo
encoder.block.14.layer.0.SelfAttention.q
encoder.block.14.layer.0.SelfAttention.k
encoder.block.14.layer.0.SelfAttention.v
encoder.block.14.layer.0.SelfAttention.o
encoder.block.14.layer.1.DenseReluDense.wi_0
encoder.block.14.layer.1.DenseReluDense.wi_1
encoder.block.14.layer.1.DenseReluDense.wo
encoder.block.15.layer.0.SelfAttention.q
encoder.block.15.layer.0.SelfAttention.k
encoder.block.15.layer.0.SelfAttention.v
encoder.block.15.layer.0.SelfAttention.o
encoder.block.15.layer.1.DenseReluDense.wi_0
encoder.block.15.layer.1.DenseReluDense.wi_1
encoder.block.15.layer.1.DenseReluDense.wo
encoder.block.16.layer.0.SelfAttention.q
encoder.block.16.layer.0.SelfAttention.k
encoder.block.16.layer.0.SelfAttention.v
encoder.block.16.layer.0.SelfAttention.o
encoder.block.16.layer.1.DenseReluDense.wi_0
encoder.block.16.layer.1.DenseReluDense.wi_1
encoder.block.16.layer.1.DenseReluDense.wo
encoder.block.17.layer.0.SelfAttention.q
encoder.block.17.layer.0.SelfAttention.k
encoder.block.17.layer.0.SelfAttention.v
encoder.block.17.layer.0.SelfAttention.o
encoder.block.17.layer.1.DenseReluDense.wi_0
encoder.block.17.layer.1.DenseReluDense.wi_1
encoder.block.17.layer.1.DenseReluDense.wo
encoder.block.18.layer.0.SelfAttention.q
encoder.block.18.layer.0.SelfAttention.k
encoder.block.18.layer.0.SelfAttention.v
encoder.block.18.layer.0.SelfAttention.o
encoder.block.18.layer.1.DenseReluDense.wi_0
encoder.block.18.layer.1.DenseReluDense.wi_1
encoder.block.18.layer.1.DenseReluDense.wo
encoder.block.19.layer.0.SelfAttention.q
encoder.block.19.layer.0.SelfAttention.k
encoder.block.19.layer.0.SelfAttention.v
encoder.block.19.layer.0.SelfAttention.o
encoder.block.19.layer.1.DenseReluDense.wi_0
encoder.block.19.layer.1.DenseReluDense.wi_1
encoder.block.19.layer.1.DenseReluDense.wo
encoder.block.20.layer.0.SelfAttention.q
encoder.block.20.layer.0.SelfAttention.k
encoder.block.20.layer.0.SelfAttention.v
encoder.block.20.layer.0.SelfAttention.o
encoder.block.20.layer.1.DenseReluDense.wi_0
encoder.block.20.layer.1.DenseReluDense.wi_1
encoder.block.20.layer.1.DenseReluDense.wo
encoder.block.21.layer.0.SelfAttention.q
encoder.block.21.layer.0.SelfAttention.k
encoder.block.21.layer.0.SelfAttention.v
encoder.block.21.layer.0.SelfAttention.o
encoder.block.21.layer.1.DenseReluDense.wi_0
encoder.block.21.layer.1.DenseReluDense.wi_1
encoder.block.21.layer.1.DenseReluDense.wo
encoder.block.22.layer.0.SelfAttention.q
encoder.block.22.layer.0.SelfAttention.k
encoder.block.22.layer.0.SelfAttention.v
encoder.block.22.layer.0.SelfAttention.o
encoder.block.22.layer.1.DenseReluDense.wi_0
encoder.block.22.layer.1.DenseReluDense.wi_1
encoder.block.22.layer.1.DenseReluDense.wo
encoder.block.23.layer.0.SelfAttention.q
encoder.block.23.layer.0.SelfAttention.k
encoder.block.23.layer.0.SelfAttention.v
encoder.block.23.layer.0.SelfAttention.o
encoder.block.23.layer.1.DenseReluDense.wi_0
encoder.block.23.layer.1.DenseReluDense.wi_1
encoder.block.23.layer.1.DenseReluDense.wo
Done.