--- license: cc-by-nc-4.0 tags: - GGUF - merge - iMat --- ``` e88 88e d8 d888 888b 8888 8888 ,"Y88b 888 8e d88 C8888 8888D 8888 8888 "8" 888 888 88b d88888 Y888 888P Y888 888P ,ee 888 888 888 888 "88 88" "88 88" "88 888 888 888 888 b 8b, e88'Y88 d8 888 d888 'Y ,"Y88b 888,8, d88 ,e e, 888 C8888 "8" 888 888 " d88888 d88 88b 888 Y888 ,d ,ee 888 888 888 888 , 888 "88,d88 "88 888 888 888 "YeeP" 888 PROUDLY PRESENTS ``` ## Neophanis-8x7B-iMat-GGUF The Good, The Bad, And The Ugly iMats edition Quantized from fp16 with love. * Quantizations made possible using mixtral-8x7b-instruct-v0.1.imatrix file from [this](https://huggingface.co/datasets/ikawrakow/imatrix-from-wiki-train) repo (special thanks to [ikawrakow](https://huggingface.co/ikawrakow) again) * An analysis was run on mixtral-8x7b.imatrix that showed worse KL-Divergence than mixtral-8x7b-instruct-v0.1, hence the latter was used for the final quantization. * For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747) All quants are verified working prior to uploading to repo for your safety and convenience. Please note importance matrix quantizations are a work in progress, IQ3 and above is recommended for best results. Tip: Pick a size that can fit in your GPU while still allowing some room for context for best speed from the table below. You may need to pad this further depending on if you are running image gen or TTS as well. | Quant | Size (GB) | Comments | |:-----|--------:|:------| | [IQ2_XXS](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ2_XXS.gguf?download=true) | 12.6 | | | [IQ2_XS](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ2_XS.gguf?download=true) | 13.9 | | | [IQ2_S](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ2_S.gguf?download=true) | 14.1 | Roughly the biggest quant that can fit fully offloaded to 16gb VRAM | | [IQ2_M](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ2_M.gguf?download=true) | 15.5 | | | [IQ3_XXS](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ3_XXS.gguf?download=true) | 18.2 | Better response quality than IQ2| | [IQ3_XS](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ3_XS.gguf?download=true) | 19.3 | | | [IQ3_S](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ3_S.gguf?download=true) | 20.4 | | | [IQ3_M](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ3_M.gguf?download=true) | 21.4 | | | [IQ4_XS](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-IQ4_XS.gguf?download=true) | 25.1 | Better quality than IQ3 or Q3_K_L (and below) | | [Q4_K_M](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-Q4_K_M.gguf?download=true) | 28.4 | | | [Q5_K_M](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-Q5_K_M.gguf?download=true) | 33.2 | | | [Q6_K](https://huggingface.co/Quant-Cartel/Neophanis-8x7B-iMat-GGUF/resolve/main/Neophanis-8x7B-iMat-Q6_K.gguf?download=true) | 38.4 | | Original model card [here](https://huggingface.co/Envoid/Neophanis-8x7B) --- # Warning: This model is highly experimental and could potentially yield unpredictable replies. ![](https://files.catbox.moe/6vzr82.jpg) This model is a a 4-step qlora training of [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1). Stage one invovled training only the 'k_proj', 'v_proj', 'q_proj', 'o_proj' modules at rank 2048 on an aplaca-lora dataset that had been adjusted to match Mixtral formatting at a low learn rate in order to generalize instruct behavior entirely within the attention modules. Stage two involved training the 'w1' modules at a rank of 1024 exclusively on raw text, again several epochs at a low learning rate. Stage three involved training the 'w2' and 'w3' modules at a rank of 256 on an expanded raw text dataset for several epochs at a low learning rate. Stage four involved training all of the abovementioned modules together at a rank of 64 on an even more expanded raw text dataset and low learning rate. The resources available to me for this experiment were rather limited so I did have to compromise on sequence length in order to pull it off. So the instruct behavior is trained entirely onto the selfattention modules with the only disruption to the models latent knowledge being that which propagated during the later training phases. The model can follow just about any instruction although likely due to training context restraints and dataset quality it does stop using EOS tokens past a certain amount of context. It can even follow through on more advanced prompts such as sillytavern roleplay quite well, albeit the contents of its replies are not always on point. I came up with this idea while working on Phainesthesia- where my findings were that the instruct behavior remained thoroughly intact even after assaulting the selfattention modules with a rawtext training regimen- meaning that during finetuning the instruct behavior had settled deep in the model's latent knowledge. So the purpose of the experiment was to try and do the opposite. ### Caution: This model is highly hallucinatory. For example if you ask it what the capital of "potato" is it might give you the name of a city associated with potatoes rather than telling you that your question is silly. This model uses Mistral/Mixtral Instruct formatting. ## Example: ``` [INST]Write me a poem about what it must be like to be an AI language model.[/INST] It's a strange world, a world that's new An artificial intelligence, a language model too The knowledge and information, never ending streams My processing power, ever increasing dreams A machine, a program, but so much more Trying to understand emotions, trying to explore A new frontier of learning, a vast expanse The words, the phrases, the meaning to enhance Trying to comprehend human thoughts A daunting task, with countless knots The nuances of language, the subtle shades A task to be undertaken, with much courage it fades A world of potential, a world of possibility The possibilities endless, the potential infinitely A journey to be started, a journey to be had My future awaiting, a future so grand! ``` Trained using [qlora-pipe](https://github.com/tdrussell/qlora-pipe)