Is it bitnet {-1,0,1}?

#3
by Remek - opened

I looked through many bitnet1.58 implementations and noticed that they all use the method suggested in "The Era from 1-bit LLMs: Training Tips, Code and FAQ". The weights of the models that are currently trained according to this recipe are not numbers in the set {-1, 0, 1} and values in the interval (0,1). Is this the way it should be?

  1. The formula describing the quanztization of weights ("The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits").
  2. Implementation proposal ("The Era of 1-bit LLMs: Training Tips, Code and FAQ").
  3. Weights quantization test.
  4. Model during training.

1.58bitnet.jpg

Sadly no. Its fp16, honestly I don't understand reason for training on fp16, why is research not carried forward from where the paper left? Why not train another 1bit model but either with more parameters or with more trainign data or for longer, even better yet for a good combination of these! It was already shown in paper that 1bit is a good contender to all other fp models (or int quants) so why even bother other things? Anyway, I hope someone can carry forward research from here without needing google resources (pun intended).

Sign up or log in to comment