|
--- |
|
tags: |
|
- GGUF |
|
- iMat |
|
- Llama3 |
|
- conversational |
|
--- |
|
|
|
``` |
|
e88 88e d8 |
|
d888 888b 8888 8888 ,"Y88b 888 8e d88 |
|
C8888 8888D 8888 8888 "8" 888 888 88b d88888 |
|
Y888 888P Y888 888P ,ee 888 888 888 888 |
|
"88 88" "88 88" "88 888 888 888 888 |
|
b |
|
8b, |
|
|
|
e88'Y88 d8 888 |
|
d888 'Y ,"Y88b 888,8, d88 ,e e, 888 |
|
C8888 "8" 888 888 " d88888 d88 88b 888 |
|
Y888 ,d ,ee 888 888 888 888 , 888 |
|
"88,d88 "88 888 888 888 "YeeP" 888 |
|
|
|
PROUDLY PRESENTS |
|
``` |
|
|
|
## experiment_2_8b-iMat-GGUF |
|
|
|
<b>Quantization Notes: Quantized from 3500 checkpoint. Use repetition penalty (--repeat-penalty on llama.cpp) of ~1.15 with Q6_K and lower and ~1.18 with IQ3_M and lower for best results. </b> |
|
|
|
Quantized from fp16 with love. |
|
* Weighted quantizations were created using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 189 chunks and n_ctx=512 |
|
* This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details |
|
* The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file |
|
|
|
For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747) |
|
|
|
<b>All quants are verified working prior to uploading to repo for your safety and convenience. </b> |
|
|
|
Original model card [here](https://huggingface.co/rAIfle/experiment_2_8b-fp16) and below |
|
|
|
--- |
|
|
|
|
|
# experiment_2_8b-fp16 |
|
|
|
Another experimental train w/ unsloth. This time, roughly 0.6 epochs of the cleaned c2-logs. My metaparams are probably bad, since the loss-value was super weird at the end. Also uploaded another version in the `checkpoint-3500`-branch that may mitigate some of that. |
|
|
|
|