SerialKicked commited on
Commit
1da6a85
1 Parent(s): 8802cae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -15,16 +15,17 @@ Simply put, I'm making my methodology to evaluate RP models public. While none o
15
 
16
  # Testing Environment
17
 
18
- - All models are loaded in Q8_0 (GGUF) with all layers on the GPU (NVidia RTX3060 12GB)
19
  - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
20
  - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
21
- - 7-10B Models
22
- - - All models are extended to **16K context length** (auto rope from KCPP)
23
- - - **Flash Attention** and **ContextShift** enabled.
24
- - 11-15B Models:
25
- - - All models are extended to **12K context length** (auto rope from KCPP)
26
- - - **Flash Attention** and **8Bit cache compression** are enabled.
27
- - Frontend is staging version of Silly Tavern.
 
28
  - Response size set to 1024 tokens max.
29
  - Fixed Seed for all tests: **123**
30
 
 
15
 
16
  # Testing Environment
17
 
18
+ - Frontend is staging version of Silly Tavern.
19
  - Backend is the latest version of KoboldCPP for Windows using CUDA 12.
20
  - Using **CuBLAS** but **not using QuantMatMul (mmq)**.
21
+ - **7-10B Models:**
22
+ - All models are loaded in Q8_0 (GGUF)
23
+ - All models are extended to **16K context length** (auto rope from KCPP)
24
+ - **Flash Attention** and **ContextShift** enabled.
25
+ - **11-15B Models:**
26
+ - All models are loaded in Q4_KM or whatever is the highest/closest available (GGUF)
27
+ - All models are extended to **12K context length** (auto rope from KCPP)
28
+ - **Flash Attention** and **8Bit cache compression** are enabled.
29
  - Response size set to 1024 tokens max.
30
  - Fixed Seed for all tests: **123**
31