A first attempt at Q6_K quantization of zetasepic's abliterated Instruct model of Qwen2.5-72B. An original 284 GB f32 GGUF file was converted to a 62.8 GB Q6_K quantization and was further split into two files for upload.

Please note that during the quantization process, 80 of 562 tensors required fallback quantization to Q8, so there's still some question around how optimal the quantization here is.

Instructions on how to use this model with Ollama (Windows 11 Example)

  1. You'll need to download both shards of this file in a separate folder, which had to be split due to the 50GB HuggingFace file upload limit.

  2. Then, use the latest binaries from the llama.cpp release page. For example, at the moment of this writing the latest release is b4202, and for a Windows PC using Nvidia CUDA-enabled GPUs, the binary we could download might be cudart-llama-bin-win-cu12.4-x64.zip.

  3. Extract the zip file with the binaries into the same folder where you downloaded the quantized model shards.

  4. Open up the Windows PowerShell, I recommend you use PowerShell 7, preferably installed via WinGet, if you aren't already using it.

  5. Change directory into the folder where you have your model shards and the llama.cpp binaries located and run the following command:

    .\llama-gguf-split.exe --merge Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K-00001-of-00002.gguf Qwen2.5-72B-Instruct-abliterated-Q6_K-2.gguf
    

    You should now have an approximately 62.8 GB sized file named Qwen2.5-72B-Instruct-abliterated-Q6_K-2.gguf in your directory.

  6. Next, use the PowerShell to create an empty Modelfile item

    New-Item -Path "Modelfile" -ItemType File
    
  7. Open the Modelfile with Notepad and add the following, keeping in mind that you can alter the context length to what you like (here it's been set to 16K):

    FROM Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf
    # Set context length to 16K
    PARAMETER num_ctx 16384
    
  8. Create the model in Ollama using the following command:

    ollama create Qwen2.5-72B-Instruct-abliterated-Q6_K -f Modelfile
    

    It will take a few moments to transfer the model data into Ollama.

  9. You can use ollama list to check to see if the model has been loaded, and you should see console output similar to this:

    PS C:\AI\Q25-72B-Instruct-Abliterated-Q6_K-GGUF> ollama list
    NAME                                                            ID              SIZE      MODIFIED
    Qwen2.5-72B-Instruct-abliterated-Q6_K:latest                    1c78aabf213f    64 GB     5 seconds ago
    

You should be good to go from here! From here you can delete the folder you were working in as the model is now loaded into your Ollama model storage and ready for use with whatever service you've integrated with Ollama.


This model is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.

Downloads last month
66
GGUF
Model size
72.7B params
Architecture
qwen2

6-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for blotfaba/Qwen-2.5-72B-Instruct-abliterated-v2-Q6_K.gguf

Base model

Qwen/Qwen2.5-72B
Quantized
(1)
this model