Downloading shards: 100% 3/3 [00:41<00:00, 13.87s/it] Loading checkpoint shards: 100% 3/3 [00:07<00:00, 2.53s/it] generation_config.json: 100% 115/115 [00:00<00:00, 575kB/s] tokenizer_config.json: 100% 1.60k/1.60k [00:00<00:00, 8.48MB/s] tokenizer.model: 100% 493k/493k [00:00<00:00, 22.9MB/s] tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 7.43MB/s] added_tokens.json: 100% 51.0/51.0 [00:00<00:00, 283kB/s] special_tokens_map.json: 100% 420/420 [00:00<00:00, 1.74MB/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Reconstructing layer: model.layers.25.mlp.down_proj Reduced from torch.Size([4096]) to 3607 Layer mlp.down_proj_25 has already been modified. Skipping. Restored original weights for layer: model.layers.25.mlp.down_proj Reconstructing layer: model.layers.25.mlp.down_proj Reduced from torch.Size([4096]) to 3607 Restored original weights for layer: model.layers.25.mlp.down_proj ['.31.', '.30.', '.29.', '.28.', '.27.', '.26.', '.25.', '.24.', '.23.', '.22.', '.21.', '.20.', '.19.', '.18.', '.17.', '.16.', '.15.', '.14.', '.13.', '.12.', '.11.', '.10.', '.9.', '.8.', '.7.', '.6.', '.5.', '.4.', '.3.', '.2.', '.1.', '.0.'] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. avg_loss = 2.1474520114478235: 100% 871/871 [00:46<00:00, 18.55it/s] /usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty. warnings.warn("Repo card metadata block was not found. Setting CardData to empty.") Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. avg_loss = 9.703152929898351: 100% 256/256 [00:13<00:00, 18.83it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. avg_loss = 13.355979550516967: 100% 264/264 [00:14<00:00, 18.66it/s] ================================================== The initial perplexity of the model is 12.614558219909668 ================================================== Reconstructing layer: model.layers.31.mlp.down_proj Reduced from torch.Size([4096]) to 3753 avg_loss = 2.150142833641832: 100% 871/871 [00:46<00:00, 18.75it/s] avg_loss = 9.714343913365155: 100% 256/256 [00:13<00:00, 18.74it/s] avg_loss = 13.374103391260812: 100% 264/264 [00:14<00:00, 18.43it/s] Restored original weights for layer: model.layers.31.mlp.down_proj Reconstructing layer: model.layers.31.mlp.up_proj Reduced from torch.Size([4096]) to 3717 avg_loss = 2.1734046262660063: 100% 871/871 [00:46<00:00, 18.57it/s] avg_loss = 9.82143080001697: 100% 256/256 [00:13<00:00, 18.57it/s] avg_loss = 13.477815985228077: 100% 264/264 [00:14<00:00, 18.20it/s] Restored original weights for layer: model.layers.31.mlp.up_proj Reconstructing layer: model.layers.31.self_attn.q_proj Reduced from torch.Size([4096]) to 818 avg_loss = 2.148138916040808: 100% 871/871 [00:46<00:00, 18.53it/s] avg_loss = 9.705221582669765: 100% 256/256 [00:13<00:00, 18.62it/s] avg_loss = 13.35540055280382: 100% 264/264 [00:14<00:00, 18.71it/s] ************************************************** Improved perplexity found: 12.613171577453613 for layer self_attn.q_proj .31.. Total modifications is 1 ************************************************** Reconstructing layer: model.layers.31.self_attn.k_proj Reduced from torch.Size([1024]) to 524 avg_loss = 2.1553964071514686: 100% 871/871 [00:46<00:00, 18.71it/s] avg_loss = 9.734999645967036: 100% 256/256 [00:13<00:00, 18.84it/s] avg_loss = 13.383289175954731: 100% 264/264 [00:14<00:00, 18.51it/s] Restored original weights for layer: model.layers.31.self_attn.k_proj Reconstructing layer: model.layers.31.self_attn.v_proj Reduced from torch.Size([1024]) to 846 avg_loss = 2.1430855287339465: 100% 871/871 [00:46<00:00, 18.78it/s] avg_loss = 9.666598222218454: 100% 256/256 [00:13<00:00, 18.74it/s] avg_loss = 13.313674368641593: 100% 264/264 [00:14<00:00, 18.69it/s] ************************************************** Improved perplexity found: 12.513681411743164 for layer self_attn.v_proj .31.. Total modifications is 2 ************************************************** Reconstructing layer: model.layers.31.self_attn.o_proj Reduced from torch.Size([4096]) to 834 avg_loss = 2.1483869746960402: 100% 871/871 [00:47<00:00, 18.46it/s] avg_loss = 9.686229056213051: 100% 256/256 [00:13<00:00, 18.78it/s] avg_loss = 13.344844787861362: 100% 264/264 [00:14<00:00, 18.56it/s] Restored original weights for layer: model.layers.31.self_attn.o_proj Reconstructing layer: model.layers.30.mlp.down_proj Reduced from torch.Size([4096]) to 3770 avg_loss = 2.1505854418576105: 100% 871/871 [00:47<00:00, 18.34it/s] avg_loss = 9.6962159560062: 100% 256/256 [00:13<00:00, 18.63it/s] avg_loss = 13.353956826256983: 100% 264/264 [00:14<00:00, 18.49it/s] Restored original weights for layer: model.layers.30.mlp.down_proj Reconstructing layer: model.layers.30.mlp.up_proj Reduced from torch.Size([4096]) to 3787 avg_loss = 2.148582770547965: 100% 871/871 [00:47<00:00, 18.34it/s] avg_loss = 9.686316559556872: 100% 256/256 [00:13<00:00, 18.59it/s] avg_loss = 13.34067751738158: 100% 264/264 [00:14<00:00, 18.81it/s] Restored original weights for layer: model.layers.30.mlp.up_proj Reconstructing layer: model.layers.30.self_attn.q_proj Reduced from torch.Size([4096]) to 819 avg_loss = 2.1425534111760927: 100% 871/871 [00:47<00:00, 18.40it/s] avg_loss = 9.664284548722208: 100% 256/256 [00:13<00:00, 18.49it/s] avg_loss = 13.309857179721197: 100% 264/264 [00:14<00:00, 18.63it/s] ************************************************** Improved perplexity found: 12.504617691040039 for layer self_attn.q_proj .30.. Total modifications is 3 ************************************************** Reconstructing layer: model.layers.30.self_attn.k_proj Reduced from torch.Size([1024]) to 524 avg_loss = 2.1449567824088884: 100% 871/871 [00:47<00:00, 18.51it/s] avg_loss = 9.675114367622882: 100% 256/256 [00:13<00:00, 18.56it/s] avg_loss = 13.32237600783507: 100% 264/264 [00:14<00:00, 18.72it/s] Restored original weights for layer: model.layers.30.self_attn.k_proj Reconstructing layer: model.layers.30.self_attn.v_proj Reduced from torch.Size([1024]) to 812 avg_loss = 2.155356107294628: 100% 871/871 [00:47<00:00, 18.48it/s] avg_loss = 9.7138080005534: 100% 256/256 [00:13<00:00, 18.37it/s] avg_loss = 13.366635067444859: 100% 264/264 [00:14<00:00, 18.33it/s] Restored original weights for layer: model.layers.30.self_attn.v_proj Reconstructing layer: model.layers.30.self_attn.o_proj Reduced from torch.Size([4096]) to 859 avg_loss = 2.146158002821641: 100% 871/871 [00:47<00:00, 18.33it/s] avg_loss = 9.676836102735251: 100% 256/256 [00:13<00:00, 18.43it/s] avg_loss = 13.318221795287998: 100% 264/264 [00:14<00:00, 18.33it/s] Restored original weights for layer: model.layers.30.self_attn.o_proj Reconstructing layer: model.layers.29.mlp.down_proj Reduced from torch.Size([4096]) to 3763 avg_loss = 2.1450509054652587: 100% 871/871 [00:47<00:00, 18.35it/s] avg_loss = 9.6743658403866: 100% 256/256 [00:14<00:00, 18.21it/s] avg_loss = 13.321742536895202: 100% 264/264 [00:14<00:00, 18.19it/s] Restored original weights for layer: model.layers.29.mlp.down_proj Reconstructing layer: model.layers.29.mlp.up_proj Reduced from torch.Size([4096]) to 3828 avg_loss = 2.1408350525165125: 100% 871/871 [00:47<00:00, 18.21it/s] avg_loss = 9.65894997306168: 100% 256/256 [00:14<00:00, 18.26it/s] avg_loss = 13.306687997146087: 100% 264/264 [00:14<00:00, 18.31it/s] ************************************************** Improved perplexity found: 12.497097969055176 for layer mlp.up_proj .29.. Total modifications is 4 ************************************************** Reconstructing layer: model.layers.29.self_attn.q_proj Reduced from torch.Size([4096]) to 803 avg_loss = 2.1367383972238043: 100% 871/871 [00:47<00:00, 18.18it/s] avg_loss = 9.641230288892984: 100% 256/256 [00:13<00:00, 18.36it/s] avg_loss = 13.289274643767964: 100% 264/264 [00:14<00:00, 18.47it/s] ************************************************** Improved perplexity found: 12.455863952636719 for layer self_attn.q_proj .29.. Total modifications is 5 **************************************************