A first attempt at Q6_K quantization of zetasepic's abliterated Instruct model of Qwen2.5-72B. An original 284 GB f32 GGUF file was converted to a 62.8 GB Q6_K quantization and was further split into two files for upload.

Please note that during the quantization process, 80 of 562 tensors required fallback quantization to Q8, so there's still some question around how optimal the quantization here is.

Instructions on how to use this model with Ollama (Windows 11 Example)

You'll need to download both shards of this file in a separate folder, which had to be split due to the 50GB HuggingFace file upload limit.
Then, use the latest binaries from the llama.cpp release page. For example, at the moment of this writing the latest release is b4202, and for a Windows PC using Nvidia CUDA-enabled GPUs, the binary we could download might be cudart-llama-bin-win-cu12.4-x64.zip.
Extract the zip file with the binaries into the same folder where you downloaded the quantized model shards.
Open up the Windows PowerShell, I recommend you use PowerShell 7, preferably installed via WinGet, if you aren't already using it.
Change directory into the folder where you have your model shards and the llama.cpp binaries located and run the following command:
```
.\llama-gguf-split.exe --merge Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K-00001-of-00002.gguf Qwen2.5-72B-Instruct-abliterated-Q6_K-2.gguf
```
You should now have an approximately 62.8 GB sized file named Qwen2.5-72B-Instruct-abliterated-Q6_K-2.gguf in your directory.
Next, use the PowerShell to create an empty Modelfile item
```
New-Item -Path "Modelfile" -ItemType File
```
Open the Modelfile with Notepad and add the following, keeping in mind that you can alter the context length to what you like (here it's been set to 16K):
```
FROM Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf
# Set context length to 16K
PARAMETER num_ctx 16384
```
Create the model in Ollama using the following command:
```
ollama create Qwen2.5-72B-Instruct-abliterated-Q6_K -f Modelfile
```
It will take a few moments to transfer the model data into Ollama.

You can use ollama list to check to see if the model has been loaded, and you should see console output similar to this:

PS C:\AI\Q25-72B-Instruct-Abliterated-Q6_K-GGUF> ollama list
NAME                                                            ID              SIZE      MODIFIED
Qwen2.5-72B-Instruct-abliterated-Q6_K:latest                    1c78aabf213f    64 GB     5 seconds ago

You should be good to go from here! From here you can delete the folder you were working in as the model is now loaded into your Ollama model storage and ready for use with whatever service you've integrated with Ollama.

blotfaba
/

Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf

Instructions on how to use this model with Ollama (Windows 11 Example)

Model tree for blotfaba/Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf