This model have quantisation problem sometime (I think)
Sometimes a prompt crashes and displays: "RuntimeError: probability tensor contains either inf, nan or element < 0"
The model "LoneStriker_MiquMaid-v2-70B-4.65bpw-h6-exl2" don't have this problem.
Every time I get this error, I have to reload the model, but after 1 or 2 messages, I still get the error.
https://github.com/facebookresearch/llama/issues/380#issuecomment-1900603862
After searching on the internet, when I found this github post, I decided to test the non DPO version and so far I have no problems despite not changing any settings.
Is it possible for this model to show "unk" or "nan" (something like this) because of his finetuning?
Well, I'm seeing the same problem with the "LoneStriker_MiquMaid-v2-70B-4.65bpw-h6-exl2" model now. Maybe I'm wrong... :/
The amazing thing is that "RuntimeError: probability tensor contains either inf
, nan
or element < 0" occurs at times, just reload the model and it works for a prompt or two.
Otherwise, you just need to regenerate a response several times for the error to occur again. This indicates instability, perhaps a lack of precision, or an error introduced in finetuning.
A possible solution would be to use BF16 instead of FP16, if the problem is due to a lack of precision.
In fact, the possibilities for this type of error are enormous.
I'm on a fairly stable environment (Linux Mint 21.3, Linux Kernel 6.5.0-17, Nvidia Driver 545, RTX 3090 x2.)
Well, I find a solution : Use the Model Loader ExLlamav2 instead od ExLlamav2_HF.
I'll make a post on exllamav2's github if that would help spot a problem.
I'm sorry I thought it was finetuning or something related to your model (besides, I tested Kooten's model and had the same problem, with and without DPO).
I generally only quantize models from others as I only have a handful of my own model fine-tunes.