make readme example code more readable
Browse files
README.md
CHANGED
@@ -42,9 +42,9 @@ We used [200k query-document pairs](https://huggingface.co/datasets/nixiesearch/
|
|
42 |
|
43 |
This repo has multiple versions of the model:
|
44 |
|
45 |
-
* model-*.safetensors: FP16 checkpoint, suitable for down-stream fine-tuning
|
46 |
-
* ggml-model-f16.gguf: F16 non-quantized llama-cpp checkpoint, for CPU inference
|
47 |
-
* ggml-model-q4.gguf: Q4_0 quantized llama-cpp checkpoint, for fast (and less precise) CPU inference.
|
48 |
|
49 |
## Prompt formats
|
50 |
|
@@ -61,12 +61,14 @@ Some notes on format:
|
|
61 |
|
62 |
## Inference example
|
63 |
|
64 |
-
With llama-cpp and Q4 model the inference can be done on a CPU:
|
65 |
|
66 |
```bash
|
67 |
-
$ ./main -m ~/models/nixie-querygen-v2/ggml-model-q4.gguf -p "git lfs track will begin tracking
|
|
|
|
|
|
|
68 |
|
69 |
-
system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
|
70 |
sampling:
|
71 |
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
|
72 |
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
|
@@ -76,7 +78,9 @@ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp
|
|
76 |
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
|
77 |
|
78 |
|
79 |
-
git lfs track will begin tracking a new file or an existing file that is already checked in to your
|
|
|
|
|
80 |
```
|
81 |
|
82 |
## Training config
|
|
|
42 |
|
43 |
This repo has multiple versions of the model:
|
44 |
|
45 |
+
* model-*.safetensors: Pytorch FP16 checkpoint, suitable for down-stream fine-tuning
|
46 |
+
* ggml-model-f16.gguf: GGUF F16 non-quantized [llama-cpp](https://github.com/ggerganov/llama.cpp) checkpoint, for CPU inference
|
47 |
+
* ggml-model-q4.gguf: GGUF Q4_0 quantized [llama-cpp](https://github.com/ggerganov/llama.cpp) checkpoint, for fast (and less precise) CPU inference.
|
48 |
|
49 |
## Prompt formats
|
50 |
|
|
|
61 |
|
62 |
## Inference example
|
63 |
|
64 |
+
With [llama-cpp](https://github.com/ggerganov/llama.cpp) and Q4 model the inference can be done on a CPU:
|
65 |
|
66 |
```bash
|
67 |
+
$ ./main -m ~/models/nixie-querygen-v2/ggml-model-q4.gguf -p "git lfs track will begin tracking \
|
68 |
+
a new file or an existing file that is already checked in to your repository. When you run git \
|
69 |
+
lfs track and then commit that change, it will update the file, replacing it with the LFS \
|
70 |
+
pointer contents. short query:" -s 1
|
71 |
|
|
|
72 |
sampling:
|
73 |
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
|
74 |
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
|
|
|
78 |
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
|
79 |
|
80 |
|
81 |
+
git lfs track will begin tracking a new file or an existing file that is already checked in to your
|
82 |
+
repository. When you run git lfs track and then commit that change, it will update the file,
|
83 |
+
replacing it with the LFS pointer contents. short regular query: git-lfs track [end of text]
|
84 |
```
|
85 |
|
86 |
## Training config
|