DavidAU
/

L3-Dark-Planet-8B-GGUF

Model card Files Files and versions Community

DavidAU commited on 13 days ago

Commit

4c18a9c

•

1 Parent(s): c22fc73

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -39,7 +39,7 @@ pipeline_tag: text-generation
 - All quants have also been upgraded with "more bits" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the "refresh")
 - New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
 - "MAX": output tensor / embed at float 16. (better instruction following/output generation than standard quants)
-- "MAX-CPU": output tensor / embed at bfloat 16, which forces these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.
 - Q8_0 (Max,Max-CPU) now clocks in at 9.5 bits per weight (average).
 <h2>L3-Dark-Planet-8B-GGUF</h2>

 - All quants have also been upgraded with "more bits" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the "refresh")
 - New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
 - "MAX": output tensor / embed at float 16. (better instruction following/output generation than standard quants)
+- "MAX-CPU": output tensor / embed at bfloat 16, which forces these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too. Example: q8_0 Max-CPU : 2004 mb will load on to CPU/RAM, 7073 mb will load onto the GPU/vram. Extra Vram can be used for context.
 - Q8_0 (Max,Max-CPU) now clocks in at 9.5 bits per weight (average).
 <h2>L3-Dark-Planet-8B-GGUF</h2>