KnutJaegersberg
/

2-bit-LLMs

Text Generation

Inference Endpoints

Model card Files Files and versions Community

KnutJaegersberg commited on Feb 12

Commit

a9bfb02

•

1 Parent(s): 41e092b

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -7,6 +7,8 @@ pipeline_tag: text-generation
 This is a model collection of mostly larger LLMs quantized to 2 bit with the novel quip# inspired approach in llama.cpp
 Sometimes both xs and xxs are available.
 ### Overview
 - Senku-70b

 This is a model collection of mostly larger LLMs quantized to 2 bit with the novel quip# inspired approach in llama.cpp
 Sometimes both xs and xxs are available.
+Note that for some larger models, like Qwen-72b based models, the context length might be too large for most GPUs, so you have to reduce it yourself in textgen-webui via the n_ctx setting.
+Rope scaling for scaled models like longalpaca or yarn should be 8, set compress_pos_emb accordingly.
 ### Overview
 - Senku-70b