You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Guide: Using a Custom Fine-Tuned Model with bitnet.cpp

This document outlines the process of downloading a custom fine-tuned model, converting it to the GGUF format, compiling the necessary C++ code, and running inference.

Prerequisites

Before you begin, ensure you have the following prerequisites installed and configured:

  • Python 3.9 or later
  • CMake 3.22 or later
  • A C++ compiler (e.g., clang, g++)
  • The Hugging Face Hub CLI (huggingface-cli)

Step 1: Download the Custom Model

In this guide, we will use the tuandunghcmut/BitNET-Summarization model as an example. This model was fine-tuned by tuandunghcmut for summarization tasks. We will download it and place it in a directory that the setup_env.py script can recognize.

huggingface-cli download tuandunghcmut/BitNET-Summarization --local-dir models/BitNet-b1.58-2B-4T

This command downloads the model and places it in the models/BitNet-b1.58-2B-4T directory. This is a workaround to make the existing scripts recognize the custom model.

Step 2: Convert the Model to GGUF Format

The downloaded model is in the .safetensors format. We need to convert it to the GGUF format to be used with bitnet.cpp. We will use the convert-helper-bitnet.py script for this.

However, the script needs some modifications to work with this custom model.

Modifications to the Conversion Scripts

  1. utils/convert-helper-bitnet.py: Add the --skip-unknown flag to the cmd_convert list to ignore unknown tensor names.

    cmd_convert = [
        sys.executable,
        str(convert_script),
        str(model_dir),
        "--vocab-type", "bpe",
        "--outtype", "f32",
        "--concurrency", "1",
        "--outfile", str(gguf_f32_output),
        "--skip-unknown"
    ]
    
  2. utils/convert-hf-to-gguf-bitnet.py:

    • Add the BitNetForCausalLM architecture to the @Model.register decorator for the BitnetModel class.
    • Change the set_vocab method in the BitnetModel class to use _set_vocab_gpt2().
    @Model.register("BitNetForCausalLM", "BitnetForCausalLM")
    class BitnetModel(Model):
        model_arch = gguf.MODEL_ARCH.BITNET
    
        def set_vocab(self):
            self._set_vocab_gpt2()
    

Running the Conversion

After making these changes, run the conversion script:

python utils/convert-helper-bitnet.py models/BitNet-b1.58-2B-4T

This will create the ggml-model-i2s-bitnet.gguf file in the model directory.

Step 3: Compile bitnet.cpp

Now, we need to compile the C++ code. We will use the setup_env.py script for this. We will use the i2_s quantization type.

python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s

This command will compile the C++ code and create the necessary binaries.

Step 4: Run Inference

Finally, we can run inference with the converted model.

python run_inference.py -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "Hello"

This will load the model and generate a response to the prompt "Hello".

Build Environment

This project was built and compiled on a CPU-only machine with the following specifications:

  • CPU: AMD EPYC 9754 128-Core Processor
  • Memory: 251Gi

Fine-Tuning

The tuandunghcmut/BitNET-Summarization model was fine-tuned using a special Quantization-Aware Training (QAT) process. This was done with the support of the BitNet layer from the Hugging Face library.

Downloads last month
38
GGUF
Model size
2.41B params
Architecture
bitnet-b1.58
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support