A first attempt at Q6_K quantization of zetasepic's abliterated Instruct model of Qwen2.5-72B. An original 284 GB f32 GGUF file was converted to a 62.8 GB Q6_K quantization and was further split into two files for upload.
Please note that during the quantization process, 80 of 562 tensors required fallback quantization to Q8, so there's still some question around how optimal the quantization here is.
Instructions on how to use this model with Ollama (Windows 11 Example)
You'll need to download both shards of this file in a separate folder, which had to be split due to the 50GB HuggingFace file upload limit.
Then, use the latest binaries from the llama.cpp release page. For example, at the moment of this writing the latest release is b4202, and for a Windows PC using Nvidia CUDA-enabled GPUs, the binary we could download might be cudart-llama-bin-win-cu12.4-x64.zip.
Extract the zip file with the binaries into the same folder where you downloaded the quantized model shards.
Open up the Windows PowerShell, I recommend you use PowerShell 7, preferably installed via WinGet, if you aren't already using it.
Change directory into the folder where you have your model shards and the llama.cpp binaries located and run the following command:
.\llama-gguf-split.exe --merge Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K-00001-of-00002.gguf Qwen2.5-72B-Instruct-abliterated-Q6_K-2.gguf
You should now have an approximately 62.8 GB sized file named
Qwen2.5-72B-Instruct-abliterated-Q6_K-2.gguf
in your directory.Next, use the PowerShell to create an empty
Modelfile
itemNew-Item -Path "Modelfile" -ItemType File
Open the
Modelfile
with Notepad and add the following, keeping in mind that you can alter the context length to what you like (here it's been set to 16K):FROM Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf # Set context length to 16K PARAMETER num_ctx 16384
Create the model in Ollama using the following command:
ollama create Qwen2.5-72B-Instruct-abliterated-Q6_K -f Modelfile
It will take a few moments to transfer the model data into Ollama.
You can use
ollama list
to check to see if the model has been loaded, and you should see console output similar to this:PS C:\AI\Q25-72B-Instruct-Abliterated-Q6_K-GGUF> ollama list NAME ID SIZE MODIFIED Qwen2.5-72B-Instruct-abliterated-Q6_K:latest 1c78aabf213f 64 GB 5 seconds ago
You should be good to go from here! From here you can delete the folder you were working in as the model is now loaded into your Ollama model storage and ready for use with whatever service you've integrated with Ollama.
This model is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
- Downloads last month
- 66
Model tree for blotfaba/Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf
Base model
Qwen/Qwen2.5-72B