leafspark commited on
Commit
93f0776
1 Parent(s): 0c07d9a

Add new quants

Browse files
Files changed (1) hide show
  1. README.md +17 -9
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license: llama3.1
3
  language:
4
  - en
5
  - de
@@ -43,10 +43,10 @@ GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshume
43
  | Q4_K_L | 45.3GB | false | false |
44
  | Q4_K_M | ??.?GB | false | false |
45
  | Q4_K_S | 40.3GB | false | false |
46
- | IQ4_NL | ??.?GB | false | true |
47
  | IQ4_XS | ??.?GB | false | true |
48
  | Q3_K_XL | 37.2GB | false | false |
49
- | Q3_K_L | ??.?GB | false | false |
50
  | Q3_K_M | 34.3GB | false | false |
51
  | IQ3_M | ??.?GB | false | true |
52
  | Q3_K_S | ??.?GB | false | false |
@@ -56,12 +56,12 @@ GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshume
56
  | IQ3_XXS | ??.?GB | false | true |
57
  | Q2_K | ??.?GB | false | false |
58
  | Q2_K_S | ??.?GB | false | true |
59
- | IQ2_M | ??.?GB | false | true |
60
- | IQ2_S | ??.?GB | false | true |
61
- | IQ2_XS | ??.?GB | false | true |
62
- | IQ2_XXS | ??.?GB | false | true |
63
- | IQ1_M | ??.?GB | false | true |
64
- | IQ1_S | ??.?GB | false | true |
65
 
66
  The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
67
 
@@ -69,9 +69,17 @@ The iMatrix dataset is bartowski's, which you can find here: [calibration_datav3
69
 
70
  Computation is done on static Q6_K for 125 chunks.
71
 
 
 
 
 
 
 
72
  ## Benchmarks
73
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60518f3731c5be7f3dd5ebc3/zNs-ZFs0SbnomH7mikiOU.png)
74
 
 
 
75
  All benchmarks tested have been checked for contamination by running [LMSys's LLM Decontaminator](https://github.com/lm-sys/llm-decontaminator). When benchmarking, we isolate the `<output>` and benchmark on solely that section.
76
 
77
  Trained from Llama 3.1 70B Instruct, you can sample from Reflection Llama-3.1 70B using the same code, pipelines, etc. as any other Llama model. It even uses the stock Llama 3.1 chat template format (though, we've trained in a few new special tokens to aid in reasoning and reflection).
 
1
  ---
2
+ license: llama3
3
  language:
4
  - en
5
  - de
 
43
  | Q4_K_L | 45.3GB | false | false |
44
  | Q4_K_M | ??.?GB | false | false |
45
  | Q4_K_S | 40.3GB | false | false |
46
+ | IQ4_NL | 38.2GB | false | true |
47
  | IQ4_XS | ??.?GB | false | true |
48
  | Q3_K_XL | 37.2GB | false | false |
49
+ | Q3_K_L | 37.1GB | false | false |
50
  | Q3_K_M | 34.3GB | false | false |
51
  | IQ3_M | ??.?GB | false | true |
52
  | Q3_K_S | ??.?GB | false | false |
 
56
  | IQ3_XXS | ??.?GB | false | true |
57
  | Q2_K | ??.?GB | false | false |
58
  | Q2_K_S | ??.?GB | false | true |
59
+ | IQ2_M | 23.0GB | false | true |
60
+ | IQ2_S | 21.2GB | false | true |
61
+ | IQ2_XS | 20.2GB | false | true |
62
+ | IQ2_XXS | 18.2GB | false | true |
63
+ | IQ1_M | 16.0GB | false | true |
64
+ | IQ1_S | 14.6GB | false | true |
65
 
66
  The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
67
 
 
69
 
70
  Computation is done on static Q6_K for 125 chunks.
71
 
72
+ ## Model Info
73
+
74
+ The model not trained on 3 epoches, because it's identical to the 2nd epoch run [mattshumer/Reflection-Llama-3.1-70B-ep2-working](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B-ep2-working) (it's possible this is also fake).
75
+
76
+ The fine-tuning was done using LoRA with rank 256 on the Llama-3.1-70B-Instruct model.
77
+
78
  ## Benchmarks
79
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60518f3731c5be7f3dd5ebc3/zNs-ZFs0SbnomH7mikiOU.png)
80
 
81
+ **Warning: These are likely false scores and cannot be replicated with this model.**
82
+
83
  All benchmarks tested have been checked for contamination by running [LMSys's LLM Decontaminator](https://github.com/lm-sys/llm-decontaminator). When benchmarking, we isolate the `<output>` and benchmark on solely that section.
84
 
85
  Trained from Llama 3.1 70B Instruct, you can sample from Reflection Llama-3.1 70B using the same code, pipelines, etc. as any other Llama model. It even uses the stock Llama 3.1 chat template format (though, we've trained in a few new special tokens to aid in reasoning and reflection).