You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Guide: Using a Custom Fine-Tuned Model with bitnet.cpp

This document outlines the process of downloading a custom fine-tuned model, converting it to the GGUF format, compiling the necessary C++ code, and running inference.

Prerequisites

Before you begin, ensure you have the following prerequisites installed and configured:

Python 3.9 or later
CMake 3.22 or later
A C++ compiler (e.g., clang, g++)
The Hugging Face Hub CLI (huggingface-cli)

Step 1: Download the Custom Model

In this guide, we will use the tuandunghcmut/BitNET-Summarization model as an example. This model was fine-tuned by tuandunghcmut for summarization tasks. We will download it and place it in a directory that the setup_env.py script can recognize.

huggingface-cli download tuandunghcmut/BitNET-Summarization --local-dir models/BitNet-b1.58-2B-4T

This command downloads the model and places it in the models/BitNet-b1.58-2B-4T directory. This is a workaround to make the existing scripts recognize the custom model.

Step 2: Convert the Model to GGUF Format

The downloaded model is in the .safetensors format. We need to convert it to the GGUF format to be used with bitnet.cpp. We will use the convert-helper-bitnet.py script for this.

However, the script needs some modifications to work with this custom model.

Modifications to the Conversion Scripts

utils/convert-helper-bitnet.py: Add the --skip-unknown flag to the cmd_convert list to ignore unknown tensor names.

cmd_convert = [
    sys.executable,
    str(convert_script),
    str(model_dir),
    "--vocab-type", "bpe",
    "--outtype", "f32",
    "--concurrency", "1",
    "--outfile", str(gguf_f32_output),
    "--skip-unknown"
]

utils/convert-hf-to-gguf-bitnet.py:
- Add the BitNetForCausalLM architecture to the @Model.register decorator for the BitnetModel class.
- Change the set_vocab method in the BitnetModel class to use _set_vocab_gpt2().
```
@Model.register("BitNetForCausalLM", "BitnetForCausalLM")
class BitnetModel(Model):
    model_arch = gguf.MODEL_ARCH.BITNET

    def set_vocab(self):
        self._set_vocab_gpt2()
```

Running the Conversion

After making these changes, run the conversion script:

python utils/convert-helper-bitnet.py models/BitNet-b1.58-2B-4T

This will create the ggml-model-i2s-bitnet.gguf file in the model directory.

Step 3: Compile bitnet.cpp

Now, we need to compile the C++ code. We will use the setup_env.py script for this. We will use the i2_s quantization type.

python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s

This command will compile the C++ code and create the necessary binaries.

Step 4: Run Inference

Finally, we can run inference with the converted model.

python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello"

This will load the model and generate a response to the prompt "Hello".

Build Environment

This project was built and compiled on a CPU-only machine with the following specifications:

CPU: AMD EPYC 9754 128-Core Processor
Memory: 251Gi

Fine-Tuning

The tuandunghcmut/BitNET-Summarization model was fine-tuned using a special Quantization-Aware Training (QAT) process. This was done with the support of the BitNet layer from the Hugging Face library.

Downloads last month: 38

GGUF

Model size

2.41B params

Architecture

bitnet-b1.58

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support