Guide: Using a Custom Fine-Tuned Model with bitnet.cpp
This document outlines the process of downloading a custom fine-tuned model, converting it to the GGUF format, compiling the necessary C++ code, and running inference.
Prerequisites
Before you begin, ensure you have the following prerequisites installed and configured:
- Python 3.9 or later
- CMake 3.22 or later
- A C++ compiler (e.g., clang, g++)
- The Hugging Face Hub CLI (
huggingface-cli
)
Step 1: Download the Custom Model
In this guide, we will use the tuandunghcmut/BitNET-Summarization
model as an example. This model was fine-tuned by tuandunghcmut
for summarization tasks. We will download it and place it in a directory that the setup_env.py
script can recognize.
huggingface-cli download tuandunghcmut/BitNET-Summarization --local-dir models/BitNet-b1.58-2B-4T
This command downloads the model and places it in the models/BitNet-b1.58-2B-4T
directory. This is a workaround to make the existing scripts recognize the custom model.
Step 2: Convert the Model to GGUF Format
The downloaded model is in the .safetensors
format. We need to convert it to the GGUF format to be used with bitnet.cpp
. We will use the convert-helper-bitnet.py
script for this.
However, the script needs some modifications to work with this custom model.
Modifications to the Conversion Scripts
utils/convert-helper-bitnet.py
: Add the--skip-unknown
flag to thecmd_convert
list to ignore unknown tensor names.cmd_convert = [ sys.executable, str(convert_script), str(model_dir), "--vocab-type", "bpe", "--outtype", "f32", "--concurrency", "1", "--outfile", str(gguf_f32_output), "--skip-unknown" ]
utils/convert-hf-to-gguf-bitnet.py
:- Add the
BitNetForCausalLM
architecture to the@Model.register
decorator for theBitnetModel
class. - Change the
set_vocab
method in theBitnetModel
class to use_set_vocab_gpt2()
.
@Model.register("BitNetForCausalLM", "BitnetForCausalLM") class BitnetModel(Model): model_arch = gguf.MODEL_ARCH.BITNET def set_vocab(self): self._set_vocab_gpt2()
- Add the
Running the Conversion
After making these changes, run the conversion script:
python utils/convert-helper-bitnet.py models/BitNet-b1.58-2B-4T
This will create the ggml-model-i2s-bitnet.gguf
file in the model directory.
Step 3: Compile bitnet.cpp
Now, we need to compile the C++ code. We will use the setup_env.py
script for this. We will use the i2_s
quantization type.
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s
This command will compile the C++ code and create the necessary binaries.
Step 4: Run Inference
Finally, we can run inference with the converted model.
python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello"
This will load the model and generate a response to the prompt "Hello".
Build Environment
This project was built and compiled on a CPU-only machine with the following specifications:
- CPU: AMD EPYC 9754 128-Core Processor
- Memory: 251Gi
Fine-Tuning
The tuandunghcmut/BitNET-Summarization
model was fine-tuned using a special Quantization-Aware Training (QAT) process. This was done with the support of the BitNet layer from the Hugging Face library.
- Downloads last month
- 38
We're not able to determine the quantization variants.