Downloading shards: 100% 3/3 [00:41<00:00, 13.87s/it]
Loading checkpoint shards: 100% 3/3 [00:07<00:00,  2.53s/it]
generation_config.json: 100% 115/115 [00:00<00:00, 575kB/s]
tokenizer_config.json: 100% 1.60k/1.60k [00:00<00:00, 8.48MB/s]
tokenizer.model: 100% 493k/493k [00:00<00:00, 22.9MB/s]
tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 7.43MB/s]
added_tokens.json: 100% 51.0/51.0 [00:00<00:00, 283kB/s]
special_tokens_map.json: 100% 420/420 [00:00<00:00, 1.74MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Reconstructing layer: model.layers.25.mlp.down_proj
Reduced from torch.Size([4096]) to 3607
Layer mlp.down_proj_25 has already been modified. Skipping.
Restored original weights for layer: model.layers.25.mlp.down_proj
Reconstructing layer: model.layers.25.mlp.down_proj
Reduced from torch.Size([4096]) to 3607
Restored original weights for layer: model.layers.25.mlp.down_proj
['.31.', '.30.', '.29.', '.28.', '.27.', '.26.', '.25.', '.24.', '.23.', '.22.', '.21.', '.20.', '.19.', '.18.', '.17.', '.16.', '.15.', '.14.', '.13.', '.12.', '.11.', '.10.', '.9.', '.8.', '.7.', '.6.', '.5.', '.4.', '.3.', '.2.', '.1.', '.0.']
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 2.1474520114478235: 100% 871/871 [00:46<00:00, 18.55it/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 9.703152929898351: 100% 256/256 [00:13<00:00, 18.83it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
avg_loss = 13.355979550516967: 100% 264/264 [00:14<00:00, 18.66it/s]
==================================================
The initial perplexity of the model is 12.614558219909668
==================================================
Reconstructing layer: model.layers.31.mlp.down_proj
Reduced from torch.Size([4096]) to 3753
avg_loss = 2.150142833641832: 100% 871/871 [00:46<00:00, 18.75it/s]
avg_loss = 9.714343913365155: 100% 256/256 [00:13<00:00, 18.74it/s]
avg_loss = 13.374103391260812: 100% 264/264 [00:14<00:00, 18.43it/s]
Restored original weights for layer: model.layers.31.mlp.down_proj
Reconstructing layer: model.layers.31.mlp.up_proj
Reduced from torch.Size([4096]) to 3717
avg_loss = 2.1734046262660063: 100% 871/871 [00:46<00:00, 18.57it/s]
avg_loss = 9.82143080001697: 100% 256/256 [00:13<00:00, 18.57it/s]
avg_loss = 13.477815985228077: 100% 264/264 [00:14<00:00, 18.20it/s]
Restored original weights for layer: model.layers.31.mlp.up_proj
Reconstructing layer: model.layers.31.self_attn.q_proj
Reduced from torch.Size([4096]) to 818
avg_loss = 2.148138916040808: 100% 871/871 [00:46<00:00, 18.53it/s]
avg_loss = 9.705221582669765: 100% 256/256 [00:13<00:00, 18.62it/s]
avg_loss = 13.35540055280382: 100% 264/264 [00:14<00:00, 18.71it/s]
**************************************************
Improved perplexity found: 12.613171577453613 for layer self_attn.q_proj .31.. Total modifications is 1
**************************************************
Reconstructing layer: model.layers.31.self_attn.k_proj
Reduced from torch.Size([1024]) to 524
avg_loss = 2.1553964071514686: 100% 871/871 [00:46<00:00, 18.71it/s]
avg_loss = 9.734999645967036: 100% 256/256 [00:13<00:00, 18.84it/s]
avg_loss = 13.383289175954731: 100% 264/264 [00:14<00:00, 18.51it/s]
Restored original weights for layer: model.layers.31.self_attn.k_proj
Reconstructing layer: model.layers.31.self_attn.v_proj
Reduced from torch.Size([1024]) to 846
avg_loss = 2.1430855287339465: 100% 871/871 [00:46<00:00, 18.78it/s]
avg_loss = 9.666598222218454: 100% 256/256 [00:13<00:00, 18.74it/s]
avg_loss = 13.313674368641593: 100% 264/264 [00:14<00:00, 18.69it/s]
**************************************************
Improved perplexity found: 12.513681411743164 for layer self_attn.v_proj .31.. Total modifications is 2
**************************************************
Reconstructing layer: model.layers.31.self_attn.o_proj
Reduced from torch.Size([4096]) to 834
avg_loss = 2.1483869746960402: 100% 871/871 [00:47<00:00, 18.46it/s]
avg_loss = 9.686229056213051: 100% 256/256 [00:13<00:00, 18.78it/s]
avg_loss = 13.344844787861362: 100% 264/264 [00:14<00:00, 18.56it/s]
Restored original weights for layer: model.layers.31.self_attn.o_proj
Reconstructing layer: model.layers.30.mlp.down_proj
Reduced from torch.Size([4096]) to 3770
avg_loss = 2.1505854418576105: 100% 871/871 [00:47<00:00, 18.34it/s]
avg_loss = 9.6962159560062: 100% 256/256 [00:13<00:00, 18.63it/s]
avg_loss = 13.353956826256983: 100% 264/264 [00:14<00:00, 18.49it/s]
Restored original weights for layer: model.layers.30.mlp.down_proj
Reconstructing layer: model.layers.30.mlp.up_proj
Reduced from torch.Size([4096]) to 3787
avg_loss = 2.148582770547965: 100% 871/871 [00:47<00:00, 18.34it/s]
avg_loss = 9.686316559556872: 100% 256/256 [00:13<00:00, 18.59it/s]
avg_loss = 13.34067751738158: 100% 264/264 [00:14<00:00, 18.81it/s]
Restored original weights for layer: model.layers.30.mlp.up_proj
Reconstructing layer: model.layers.30.self_attn.q_proj
Reduced from torch.Size([4096]) to 819
avg_loss = 2.1425534111760927: 100% 871/871 [00:47<00:00, 18.40it/s]
avg_loss = 9.664284548722208: 100% 256/256 [00:13<00:00, 18.49it/s]
avg_loss = 13.309857179721197: 100% 264/264 [00:14<00:00, 18.63it/s]
**************************************************
Improved perplexity found: 12.504617691040039 for layer self_attn.q_proj .30.. Total modifications is 3
**************************************************
Reconstructing layer: model.layers.30.self_attn.k_proj
Reduced from torch.Size([1024]) to 524
avg_loss = 2.1449567824088884: 100% 871/871 [00:47<00:00, 18.51it/s]
avg_loss = 9.675114367622882: 100% 256/256 [00:13<00:00, 18.56it/s]
avg_loss = 13.32237600783507: 100% 264/264 [00:14<00:00, 18.72it/s]
Restored original weights for layer: model.layers.30.self_attn.k_proj
Reconstructing layer: model.layers.30.self_attn.v_proj
Reduced from torch.Size([1024]) to 812
avg_loss = 2.155356107294628: 100% 871/871 [00:47<00:00, 18.48it/s]
avg_loss = 9.7138080005534: 100% 256/256 [00:13<00:00, 18.37it/s]
avg_loss = 13.366635067444859: 100% 264/264 [00:14<00:00, 18.33it/s]
Restored original weights for layer: model.layers.30.self_attn.v_proj
Reconstructing layer: model.layers.30.self_attn.o_proj
Reduced from torch.Size([4096]) to 859
avg_loss = 2.146158002821641: 100% 871/871 [00:47<00:00, 18.33it/s]
avg_loss = 9.676836102735251: 100% 256/256 [00:13<00:00, 18.43it/s]
avg_loss = 13.318221795287998: 100% 264/264 [00:14<00:00, 18.33it/s]
Restored original weights for layer: model.layers.30.self_attn.o_proj
Reconstructing layer: model.layers.29.mlp.down_proj
Reduced from torch.Size([4096]) to 3763
avg_loss = 2.1450509054652587: 100% 871/871 [00:47<00:00, 18.35it/s]
avg_loss = 9.6743658403866: 100% 256/256 [00:14<00:00, 18.21it/s]
avg_loss = 13.321742536895202: 100% 264/264 [00:14<00:00, 18.19it/s]
Restored original weights for layer: model.layers.29.mlp.down_proj
Reconstructing layer: model.layers.29.mlp.up_proj
Reduced from torch.Size([4096]) to 3828
avg_loss = 2.1408350525165125: 100% 871/871 [00:47<00:00, 18.21it/s]
avg_loss = 9.65894997306168: 100% 256/256 [00:14<00:00, 18.26it/s]
avg_loss = 13.306687997146087: 100% 264/264 [00:14<00:00, 18.31it/s]
**************************************************
Improved perplexity found: 12.497097969055176 for layer mlp.up_proj .29.. Total modifications is 4
**************************************************
Reconstructing layer: model.layers.29.self_attn.q_proj
Reduced from torch.Size([4096]) to 803
avg_loss = 2.1367383972238043: 100% 871/871 [00:47<00:00, 18.18it/s]
avg_loss = 9.641230288892984: 100% 256/256 [00:13<00:00, 18.36it/s]
avg_loss = 13.289274643767964: 100% 264/264 [00:14<00:00, 18.47it/s]
**************************************************
Improved perplexity found: 12.455863952636719 for layer self_attn.q_proj .29.. Total modifications is 5
**************************************************