Nexesenex commited on
Commit
6c09369
1 Parent(s): c0066c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -6
README.md CHANGED
@@ -1,12 +1,21 @@
1
  Requantization of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
2
 
3
- Model has a theta of 1,000,000, and not 10,000, like Llama 2 models usually have.
 
4
 
5
- So, no Alpha or Rope Base Frequency up to its base 32k context, if it works as intended.
 
6
 
7
- And if it does, no linear rope is necessary either to reach the base 32k context.
8
 
9
- Benchs I made with the original Q2_K quant published by Miqudev :
 
 
 
 
 
 
 
10
 
11
  - miqu-1-70b.q2_K.gguf,-,Hellaswag,87.75,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
12
  - miqu-1-70b.q2_K.gguf,-,Hellaswag,86.5,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
@@ -43,7 +52,7 @@ Benchs I made with the Q3_K_M I quantized from Miqudev's Q5_K_M with an intermed
43
  - miqu-1-70b.Q3_K_M.gguf,-,wikitext,4.2957,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,81
44
  - miqu-1-70b.Q3_K_M.gguf,-,wikitext,3.8380,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
45
 
46
- And now, the IQ3_XXS, new SOTA 3 bits quant from LlamaCPP, made in the same way :
47
 
48
  - miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,89,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
49
  - miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,88.3,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
@@ -58,4 +67,20 @@ And now, the IQ3_XXS, new SOTA 3 bits quant from LlamaCPP, made in the same way
58
  - miqu-1-70b.IQ3_XXS.gguf,-,wikitext,4.0309,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
59
  - miqu-1-70b.IQ3_XXS.gguf,-,wikitext,3.5141,4096,4096,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
60
 
61
- TbC
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  Requantization of a Q5_K_M quant of a trending 70b model without better quant/fp16 available, this through a Q8_0 intermediary step.
2
 
3
+ Miku 70b has a theta of 1,000,000, like CodeLlama, and not 10,000, like Llama 2 models usually have.
4
+ That feature singularizes it to my knowledge to ALL Llama 2 models, beside Codellamas.
5
 
6
+ So, no Alpha or Rope Base Frequency change is needed up to its base 32k context, if it works as intended.
7
+ And if it does, no linear/yarn rope is necessary either to reach the base 32k context.
8
 
9
+ But Miqu is NOT a CodeLlama 70b (released only a few days after Miqu 70b), because :
10
 
11
+ - If the Theta of CodeLlama 70b is claimed to be 1,000,000, its base rope actually seems to be 10,000 (see benchs..)
12
+ - Which means that CodeLlama might be context limited as Llama 2 is, instead of having a baseline of 100,000 ctx max..
13
+ - Meanwhile, Miku's perplexity is close to 70b Llama 2 (less than 4 at 512ctx), while CL 70b is around 5.5 at least.
14
+ - The benchs less sensitive to quantization (Hellaswag, Winogrande, but others as well) confirm this as well..
15
+
16
+ So, CodeLlama 70b is nerfed like the other CodeLlama in general benchmarks terms, while Miku is matching a FINETUNED Llama-2 expectations.
17
+
18
+ Benchs I made with the original Q2_K quant of Miku 70b, made from the FP16 and published by Miqudev :
19
 
20
  - miqu-1-70b.q2_K.gguf,-,Hellaswag,87.75,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
21
  - miqu-1-70b.q2_K.gguf,-,Hellaswag,86.5,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,miqudev,
 
52
  - miqu-1-70b.Q3_K_M.gguf,-,wikitext,4.2957,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,81
53
  - miqu-1-70b.Q3_K_M.gguf,-,wikitext,3.8380,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
54
 
55
+ And now, the IQ3_XXS, new SOTA 3 bits quant from LlamaCPP, that I made in the same way :
56
 
57
  - miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,89,,400,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
58
  - miqu-1-70b.IQ3_XXS.gguf,-,Hellaswag,88.3,,1000,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
 
67
  - miqu-1-70b.IQ3_XXS.gguf,-,wikitext,4.0309,512,512,2024-01-29 01:40:00,RBF1000000,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,655
68
  - miqu-1-70b.IQ3_XXS.gguf,-,wikitext,3.5141,4096,4096,2024-01-29 01:40:00,,70b,Mistral_Medium,32768,,,GGUF,miqudev,Nexesenex,
69
 
70
+ Meanwhile, CodeLlama 70b Q2_K benches as such, to compare with Miqu 70B Q2_K originally quantized from FP16 by Miqudev :
71
+
72
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag,76.5,,400,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
73
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag,76.2,,1000,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
74
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag_Bin,69.75,,400,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
75
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Hellaswag_Bin,72.5,,1000,2024-01-30 01:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
76
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Arc-Challenge,35.11705686,,299,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
77
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Arc-Easy,58.77192982,,570,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
78
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,MMLU,36.10223642,,313,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
79
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Thruthful-QA,31.08935129,,817,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
80
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,Winogrande,70.3236,,1267,2024-01-30 05:40:00,,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,
81
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,,512,512,2024-01-30 01:40:00,RBF1000000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,655
82
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,9.7866,512,512,2024-01-30 01:40:00,RBF1000000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
83
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,8.5822,512,512,2024-01-30 01:40:00,RBF500000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
84
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,7.1098,512,512,2024-01-30 01:40:00,RBF100000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
85
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.8224,512,512,2024-01-30 01:40:00,RBF50000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81
86
+ - CodeLlama-70b-Instruct-hf-Q2_K.gguf,-,wikitext,6.5705,512,512,2024-01-30 01:40:00,RBF10000,70b,CodeLlama,32768,,,GGUF,Meta,Lonestriker,81