BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
Paper • 2407.09527 • Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This repository contains an FPGA-friendly reimplementation of the HuggingFaceTB/nanowhale-100m-base model, designed for deployment on Xilinx FPGAs.
Based on literature research for FPGA deployment (QFX arxiv:2401.17544, BitNet b1.58 Reloaded arxiv:2407.09527):
Knowledge distillation from the original NanoWhale teacher model with QAT on FineWeb-Edu dataset.
python train.py \
--teacher_model HuggingFaceTB/nanowhale-100m-base \
--hub_model_id hakatu/fpga-whale-100m \
--distill_alpha 0.7 \
--temperature 2.0 \
--num_train_samples 50000 \
--max_seq_length 512 \
--bf16