SerialKicked commited on
Commit
8615360
1 Parent(s): 1da6a85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -18,16 +18,17 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
18
  - Frontend is staging version of Silly Tavern.
19
  - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
20
  - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
 
21
  - **7-10B Models:**
22
  - All models are loaded in Q8_0 (GGUF)
23
- - All models are extended to **16K context length** (auto rope from KCPP)
24
  - **Flash Attention** and **ContextShift** enabled.
 
 
25
  - **11-15B Models:**
26
  - All models are loaded in Q4_KM or whatever is the highest/closest available (GGUF)
27
- - All models are extended to **12K context length** (auto rope from KCPP)
28
  - **Flash Attention** and **8Bit cache compression** are enabled.
29
- - Response size set to 1024 tokens max.
30
- - Fixed Seed for all tests: **123**
31
 
32
 
33
  # System Prompt and Instruct Format
 
18
  - Frontend is staging version of Silly Tavern.
19
  - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
20
  - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
21
+ - Fixed Seed for all tests: **123**
22
  - **7-10B Models:**
23
  - All models are loaded in Q8_0 (GGUF)
 
24
  - **Flash Attention** and **ContextShift** enabled.
25
+ - All models are extended to **16K context length** (auto rope from KCPP)
26
+ - Response size set to 1024 tokens max.
27
  - **11-15B Models:**
28
  - All models are loaded in Q4_KM or whatever is the highest/closest available (GGUF)
 
29
  - **Flash Attention** and **8Bit cache compression** are enabled.
30
+ - All models are extended to **12K context length** (auto rope from KCPP)
31
+ - Response size set to 512 tokens max.
32
 
33
 
34
  # System Prompt and Instruct Format