eryk-mazus commited on
Commit
a3ec953
1 Parent(s): e2d9808

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -22,6 +22,8 @@ prompt_template: '<|im_start|>system
22
 
23
  *I've copy-pased some information from TheBloke's model cards, hope it's ok*
24
 
 
 
25
  ## Prompt template: ChatML
26
 
27
  ```
@@ -44,7 +46,3 @@ Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don
44
  Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
45
 
46
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
47
-
48
- ## Notes on performance
49
-
50
- For a model of this size, with stronger quantization, quality appears to decline much more than for larger models. Personally, I would advise to stick with `fp16` or `int8` for this model.
 
22
 
23
  *I've copy-pased some information from TheBloke's model cards, hope it's ok*
24
 
25
+ For a model of this size, with stronger quantization, quality appears to decline much more than for larger models. Personally, I would advise to stick with `fp16` or `int8` for this model.
26
+
27
  ## Prompt template: ChatML
28
 
29
  ```
 
46
  Change `-c 2048` to the desired sequence length. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
47
 
48
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`