Interesting
Collection
11 items
β’
Updated
Yes, exactly. When converting from float16 to float32 for fine-tuning (as I thought), we need to fill 13 bits of the mantissa and 3 bits of the exponent with zeros, rather than simply filling the last 16 bits.
I don't understand much about this, but maybe the model in F32 is just redundant. Maybe the other half of most weights are filled with zeros. It was scaled this way to fine-tune it or to make it impossible for people with few resources to run itπ