leafspark
/

Reflection-Llama-3.1-70B-GGUF

@@ -1,5 +1,5 @@
 ---
-license: llama3.1
 language:
 - en
 - de
@@ -43,10 +43,10 @@ GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshume
 | Q4_K_L       | 45.3GB | false | false   |
 | Q4_K_M       | ??.?GB | false | false   |
 | Q4_K_S       | 40.3GB | false | false   |
-| IQ4_NL       | ??.?GB | false | true    |
 | IQ4_XS       | ??.?GB | false | true    |
 | Q3_K_XL      | 37.2GB | false | false   |
-| Q3_K_L       | ??.?GB | false | false   |
 | Q3_K_M       | 34.3GB | false | false   |
 | IQ3_M        | ??.?GB | false | true    |
 | Q3_K_S       | ??.?GB | false | false   |
@@ -56,12 +56,12 @@ GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshume
 | IQ3_XXS      | ??.?GB | false | true    |
 | Q2_K         | ??.?GB | false | false   |
 | Q2_K_S       | ??.?GB | false | true    |
-| IQ2_M        | ??.?GB | false | true    |
-| IQ2_S        | ??.?GB | false | true    |
-| IQ2_XS       | ??.?GB | false | true    |
-| IQ2_XXS      | ??.?GB | false | true    |
-| IQ1_M        | ??.?GB | false | true    |
-| IQ1_S        | ??.?GB | false | true    |
 The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
@@ -69,9 +69,17 @@ The iMatrix dataset is bartowski's, which you can find here: [calibration_datav3
 Computation is done on static Q6_K for 125 chunks.
 ## Benchmarks
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60518f3731c5be7f3dd5ebc3/zNs-ZFs0SbnomH7mikiOU.png)
 All benchmarks tested have been checked for contamination by running [LMSys's LLM Decontaminator](https://github.com/lm-sys/llm-decontaminator). When benchmarking, we isolate the `<output>` and benchmark on solely that section.
 Trained from Llama 3.1 70B Instruct, you can sample from Reflection Llama-3.1 70B using the same code, pipelines, etc. as any other Llama model. It even uses the stock Llama 3.1 chat template format (though, we've trained in a few new special tokens to aid in reasoning and reflection).

 ---
+license: llama3
 language:
 - en
 - de
 | Q4_K_L       | 45.3GB | false | false   |
 | Q4_K_M       | ??.?GB | false | false   |
 | Q4_K_S       | 40.3GB | false | false   |
+| IQ4_NL       | 38.2GB | false | true    |
 | IQ4_XS       | ??.?GB | false | true    |
 | Q3_K_XL      | 37.2GB | false | false   |
+| Q3_K_L       | 37.1GB | false | false   |
 | Q3_K_M       | 34.3GB | false | false   |
 | IQ3_M        | ??.?GB | false | true    |
 | Q3_K_S       | ??.?GB | false | false   |
 | IQ3_XXS      | ??.?GB | false | true    |
 | Q2_K         | ??.?GB | false | false   |
 | Q2_K_S       | ??.?GB | false | true    |
+| IQ2_M        | 23.0GB | false | true    |
+| IQ2_S        | 21.2GB | false | true    |
+| IQ2_XS       | 20.2GB | false | true    |
+| IQ2_XXS      | 18.2GB | false | true    |
+| IQ1_M        | 16.0GB | false | true    |
+| IQ1_S        | 14.6GB | false | true    |
 The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
 Computation is done on static Q6_K for 125 chunks.
+## Model Info
+The model not trained on 3 epoches, because it's identical to the 2nd epoch run [mattshumer/Reflection-Llama-3.1-70B-ep2-working](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B-ep2-working) (it's possible this is also fake).
+The fine-tuning was done using LoRA with rank 256 on the Llama-3.1-70B-Instruct model.
 ## Benchmarks
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60518f3731c5be7f3dd5ebc3/zNs-ZFs0SbnomH7mikiOU.png)
+**Warning: These are likely false scores and cannot be replicated with this model.**
 All benchmarks tested have been checked for contamination by running [LMSys's LLM Decontaminator](https://github.com/lm-sys/llm-decontaminator). When benchmarking, we isolate the `<output>` and benchmark on solely that section.
 Trained from Llama 3.1 70B Instruct, you can sample from Reflection Llama-3.1 70B using the same code, pipelines, etc. as any other Llama model. It even uses the stock Llama 3.1 chat template format (though, we've trained in a few new special tokens to aid in reasoning and reflection).