FP16 vs BF16

#2
by Virt-io - opened

Can you start uploading new merges in BF16?

Can you start uploading new merges in BF16?

Is bf16 backwards compatible? I remember not being able to load in bf16 on my GPU, but I don't know if it can convert bf16 to fp16 when loaded in things like textgen-webui

I'm not sure I can't load models on my gpu without offloading.

I assume you can load BF16, textgen would just quantize to 4bit on inference.
What you ran into is probably trying to inference on bf16 without quantizing, this is only available on newer gpus.

The reason I'm asking for BF16, is because most models including the base meta models are in that format.

I'm not sure I can't load models on my gpu without offloading.

I assume you can load BF16, textgen would just quantize to 4bit on inference.
What you ran into is probably trying to inference on bf16 without quantizing, this is only available on newer gpus.

The reason I'm asking for BF16, is because most models including the base meta models are in that format.

I imagine I just switch the type in config to bfloat16 instead of just float16?
I don't understand the differences in precision and range so it means nothing to me πŸ˜Άβ€πŸŒ«οΈ
Plus the largest "consumer" gpu below 30 series is 11gb. An 8B model isn't fitting in that at 16bit.
Titans aren't included. If you bought one, you're insane :3

I imagine I just switch the type in config to bfloat16 instead of just float16?

What? I don't think you need to do that, just don't check bf16 on textgen.

If you check 4bit, it should quantize when it loads.


γ€€γ€€γ€€γ€€γ€€πŸŒΈοΌžγ€€γ€€γƒ• Sorry if it sounded offensive, 
γ€€γ€€γ€€γ€€γ€€| γ€€_γ€€ _ l   that wasn't my intent.
γ€€    /` γƒŸοΌΏxγƒŽ 
γ€€γ€€ γ€€ /γ€€γ€€γ€€ γ€€ |
γ€€γ€€γ€€ /γ€€ ヽ   οΎ‰
γ€€ γ€€ β”‚γ€€γ€€|γ€€|γ€€|
 / ̄|γ€€γ€€ |γ€€|γ€€|
γ€€| ( ̄ヽ__ヽ_)__)
γ€€οΌΌδΊŒγ€

I imagine I just switch the type in config to bfloat16 instead of just float16?

What? I don't think you need to do that, just don't check bf16 on textgen.

In the yaml file for merges there's a section where it asks which precision type you want, I think that's where I change it but I can't remember the name πŸ₯
Edit - I peekd, it's the dtype thing. I don't understand why it's dtype though? I though it was precision or a quantization type.

Oh sorry, I had a big oof moment.

I thought you were talking about changing it before loading it on textgen.

Yes, you edit the merge_config.yaml

dtype: bfloat16

It appears I might of taken some of my own supply.


Anyways, excited to see more SOVL experiments.

jeiku released a new one, unfortunately they also uploaded a FP16 file.
https://huggingface.co/jeiku/Orthocopter_8B

dtype: bfloat16

All future models shall be bf16-ified!

Didn't mean to edit it 😭 buttons are small on mobile 😿

It appears I might of taken some of my own supply.


Anyways, excited to see more SOVL experiments.

jeiku released a new one, unfortunately they also uploaded a FP16 file.
https://huggingface.co/jeiku/Orthocopter_8B

I like the SOVL style, it doesn't really makes the smartest models, or leaderboard breaking models. But they're always great fun and pretty consistent

Time for more bf16 preaching I guess @_@

saishf/Merge-Mayhem-L3-V2 + saishf/SOVLish-Maid-L3-8B with my weird slerp idea might be interesting. BF16 of course.

You need to redo the merges though. :(
Merging two FP16 and saving as BF16 will cause loss issues.

I don't like the bf16 models at all, the inference quality is inferior to fp16, and if we just get model weights in bfloat16, that is a loss for the community.
I like to run inference using float16 and float32 on TGI or vLLM.

If you are just messing around on your local machine, then consider using a quantized model until your project is ready for real inference.

Also bfloat16 is a google standard, where float16 is a IEEE standard.

Here are some comparisons that I found:

    |--------+------+----------+----------|
    | Format | Bits | Exponent | Fraction |
    |--------+------+----------+----------|
    | FP32   |   32 |        8 |       23 |
    | FP16   |   16 |        5 |       10 |
    | BF16   |   16 |        8 |        7 |
    |--------+------+----------+----------|

Range
bfloat16: ~1.18e-38 … ~3.40e38 with 3 significant decimal digits.
float16: ~5.96eβˆ’8 (6.10eβˆ’5) … 65504 with 4 significant decimal digits precision.

Epsilon comparison:

    |--------+------------|
    | Format |    Epsilon |
    |--------+------------|
    | FP32   | 0.00000012 |
    | FP16   | 0.00390625 |
    | BF16   | 0.03125000 |
    |--------+------------|

Dynamic Range comparison:

    |--------+-------|
    | Format | DR    |
    |--------+-------|
    | FP32   | 83.38 |
    | BF16   | 78.57 |
    | FP16   | 12.04 |
    |--------+-------|

Posits comparison:

    |----+--------+-----------|
    | es |     DR |   epsilon |
    |----+--------+-----------|
    |  1 |  16.86 | 0.0000076 |
    |  2 |  33.82 | 0.0000153 |
    |  3 |  37.43 | 0.0000305 |
    |  4 | 143.86 | 0.0000610 |
    |----+--------+-----------|

I'm not going to pretend I know what I'm talking about.


I can see why it would be annoying for models to be in bf16 when you run pure inference on fp16.
You would see quality loss due to bf16 -> fp16 conversion.

I just wanted bf16, so I could use saishf's models in merges without fear of possible degradation of merging fp16 models with bf16 models.
I am most likely overstating this issue, it might not be as bad as I think it is.

Ultimately, it is up to saishf to decide how they want to distribute their merges.

I'm not going to pretend I know what I'm talking about.


I can see why it would be annoying for models to be in bf16 when you run pure inference on fp16.
You would see quality loss due to bf16 -> fp16 conversion.

I just wanted bf16, so I could use saishf's models in merges without fear of possible degradation of merging fp16 models with bf16 models.
I am most likely overstating this issue, it might not be as bad as I think it is.

Ultimately, it is up to saishf to decide how they want to distribute their merges.

reading these it sounds like models are converted to bf16/fp16 before the merge occurs?

https://github.com/arcee-ai/mergekit/issues/204
https://github.com/arcee-ai/mergekit/issues/50

There is also

dtype: Specifies the data type used for the merging operation.

Which makes me think the data(models) type will be changed before hand.

I'm going to open an issue/question over in the mergekit repo. Cause i dont really have a definite answer πŸ˜Άβ€πŸŒ«οΈ
Github - https://github.com/arcee-ai/mergekit/issues/316

I'm curious about this too, as I don't know about the impacts when merging. When I wrote my previous post, I had just my my first merge, and it is a mixture of bf16 and fp16

https://huggingface.co/solidrust/KatyTestHistorical-SultrySilicon-7B-V2

And if I understand @Virt-io there is a potential quality loss to that merge.

@Suparious

I'm also curious, and not knowledgeable enough to answer my own question.

Hopefully, the folks over at arcee-ai can answer our question and put our minds at ease.

Sign up or log in to comment