DavidAU commited on
Commit
3b2cf8c
1 Parent(s): d20c427

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -34,14 +34,16 @@ tags:
34
  pipeline_tag: text-generation
35
  ---
36
 
37
- <B>Updates Dec 21 2024: (uploading quants ... refreshed, and new quants):</B>
38
  - All quants have been "refreshed", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants.
39
  - All quants have also been upgraded with "more bits" for output tensor and embed for better performance (this is in addition to the "refresh")
 
40
  - New "ARM" quants have been added for machines than can run them. (format: ".../Q4_0_4_4.gguf")
41
- - New specialized quants (in addition to standard): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
42
  - "MAX": output tensor / embed at float 16. (better instruction following/output generation than standard quants)
43
  - "MAX-CPU": output tensor / embed at bfloat 16, which forces these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
44
-
 
45
  <h2>L3-Dark-Planet-8B-GGUF</h2>
46
 
47
  <img src="dark-planet.jpg" style="float:right; width:300px; height:300px; padding:10px;">
 
34
  pipeline_tag: text-generation
35
  ---
36
 
37
+ <B>L3-Dark-Planet-8B-GGUF - Updates Dec 21 2024: (uploading quants ... refreshed, and new quants):</B>
38
  - All quants have been "refreshed", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants.
39
  - All quants have also been upgraded with "more bits" for output tensor and embed for better performance (this is in addition to the "refresh")
40
+ - All quants (including new "ARM" quants) the output tensor is set at Q8_0. Embed has also been upgraded.
41
  - New "ARM" quants have been added for machines than can run them. (format: ".../Q4_0_4_4.gguf")
42
+ - New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
43
  - "MAX": output tensor / embed at float 16. (better instruction following/output generation than standard quants)
44
  - "MAX-CPU": output tensor / embed at bfloat 16, which forces these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
45
+ - Q8_0 (Max,Max-CPU) now clocks in at almost 10 bits (average).
46
+ -
47
  <h2>L3-Dark-Planet-8B-GGUF</h2>
48
 
49
  <img src="dark-planet.jpg" style="float:right; width:300px; height:300px; padding:10px;">