YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

FPGA-Whale: FPGA-Friendly NanoWhale Architecture

This repository contains an FPGA-friendly reimplementation of the HuggingFaceTB/nanowhale-100m-base model, designed for deployment on Xilinx FPGAs.

Architecture Changes

Based on literature research for FPGA deployment (QFX arxiv:2401.17544, BitNet b1.58 Reloaded arxiv:2407.09527):

  1. Removed MoE → Dense SwiGLU FFN (no sparse routing, deterministic memory access)
  2. Removed Hyper-Connections → Standard residual connections (simpler dataflow)
  3. BitLinear layers → Ternary weights {-1, 0, +1} with INT8 activations
    • Multiplication-free inference on FPGA (ternary × int8 = add/sub/nop)
    • Quantization-aware training with straight-through estimator
  4. LayerNorm instead of RMSNorm → More FPGA-friendly normalization
  5. Kept MLA → Low-rank Q projection for KV-cache efficiency

Training

Knowledge distillation from the original NanoWhale teacher model with QAT on FineWeb-Edu dataset.

python train.py \
  --teacher_model HuggingFaceTB/nanowhale-100m-base \
  --hub_model_id hakatu/fpga-whale-100m \
  --distill_alpha 0.7 \
  --temperature 2.0 \
  --num_train_samples 50000 \
  --max_seq_length 512 \
  --bf16

FPGA Deployment Path

  1. Train this model with QAT to convergence
  2. Extract ternary weights and INT8 activation scales
  3. Export to ONNX / hls4ml format
  4. Synthesize with Xilinx Vitis HLS / Vivado

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for hakatu/fpga-whale-training