update model quantization
Browse files- .gitattributes +0 -1
- README.md +1 -1
- ggml-model-q4.gguf → ggml-model-q4_0.gguf +0 -0
.gitattributes
CHANGED
@@ -32,5 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
32 |
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
*.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
32 |
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
|
|
35 |
*.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -59,7 +59,7 @@ Fine-tuning took ~70 minutes on a single RTX 4090.
|
|
59 |
This model can be run with a [llama-cpp](https://github.com/ggerganov/llama.cpp) on a CPU using the following command:
|
60 |
|
61 |
```
|
62 |
-
./main -n 64 -m models/ggml-model-
|
63 |
|
64 |
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
|
65 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
|
|
|
59 |
This model can be run with a [llama-cpp](https://github.com/ggerganov/llama.cpp) on a CPU using the following command:
|
60 |
|
61 |
```
|
62 |
+
./main -n 64 -m models/ggml-model-q4_0.gguf -p "[INST] My girlfriend changed after she became a vegetarian. [/INST]"
|
63 |
|
64 |
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
|
65 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
|
ggml-model-q4.gguf → ggml-model-q4_0.gguf
RENAMED
File without changes
|