An interesting yet useless consideration over the fp16 being out or not.
I noticed something interesting : Miqu-1-70b's Q2_K size is 25.5GB.
That corresponds to recent LlamaCPP Q2_K quantizations, from january 2024, at barely 3bpw.
Previous GGUF quants Q2_K of 2023 were almost the size of a Q3_K_S, at around 3.4bpw.
So, Miqu-1-70b's Q2_K has been made in january 2024.
Either Miqudev requantized from an anterior Q5_K_M, either he quantized from a Q8_0.. or a FP16.
I'm not an expert on the internals of the GGUF format, but is there a meta-data specifying that a quant is actually a requant?
If yes, we can know.
In any case, that would lead us nowhere, but still!
considering the fact that this person was an employee of a company which had been given only the quantized versions I don't think it's possible for it to be from fp16. Either it was a requantization of Q5 or Mistral quantized it right before handing them over to the company.
When that early access was likely given, the Q2_K variant used in Miqudev's quant didn't exist yet (why to present an already obsolete product to a customer, this while you face a ferocious competition?).
Hence the interrogation.
Yeah, makes sense. I didn't realize that it was given as early access a while ago and thought it might've been given recently. I believe it was a requantization though as the Q5 was most likely the one given to them.
we could at least check if the result of q5 -> f16 -> q2 is identical to the uploaded checkpoint. if it is, it should be more than likely that it was requantized in that fashion.
All three quants have a general.name of "D:\HF", which is strong evidence that all quants are made for hf upload from something else. Edit: and in fact, all metadata kv's other than the filetype are identical.
This is the first model that could answer all my test questions (including GPT4). I wished there was a gptq or awq version (4 bit) so the speed would be more practical...