maddes8cht
/

gorilla-llm-gorilla-falcon-7b-hf-v0-gguf

GGUF

English

api

Inference Endpoints

Model card Files Files and versions Community

maddes8cht commited on Nov 5, 2023

Commit

04a8a25

•

1 Parent(s): 094a9ad

"Update README.md"

Browse files

Files changed (1) hide show

README.md +4 -6

README.md CHANGED Viewed

@@ -17,9 +17,9 @@ I'm constantly enhancing these model descriptions to provide you with the most r
 # K-Quants in Falcon 7b models
-New Llama.cpp releases now allow for K-quantization of models that were previously incompatible with K-quants. This is achieved by employing a fallback solution for model layers that cannot be accurately quantized with K-quants.
-For Falcon 7B models, although only a quarter of the layers can be quantized with true K-quants, this approach still benefits from utilizing various legacy quantization types, such as Q4_0, Q4_1, Q5_0, and Q5_1. As a result, it offers better quality at the same file size or smaller file sizes with comparable performance.
 So this solution ensures improved performance and efficiency over legacy Q4_0, Q4_1, Q5_0 and Q5_1 Quantizations.
@@ -28,14 +28,12 @@ So this solution ensures improved performance and efficiency over legacy Q4_0, Q
 As previously noted on the [Llama.cpp GitHub repository](https://github.com/ggerganov/llama.cpp#hot-topics), all new Llama.cpp releases after October 18, 2023, required re-quantization due to the implementation of the new BPE tokenizer.
-**Update:** The re-quantization process for Falcon Models is now complete, and the latest quantized models are available for download. To ensure continued compatibility with recent llama.cpp software, You need to update your Falcon models.
-**Key Points:**
 - **Stay Informed:** Keep an eye on software application release schedules using llama.cpp libraries.
 - **Monitor Upload Times:** Re-quantization is complete. Watch for updates on my Hugging Face Model pages.
-This change primarily affects **Falcon** and **Starcoder** models, with other models remaining unaffected. If you haven't already, please update your Falcon models for seamless compatibility with the latest llama.cpp versions.

 # K-Quants in Falcon 7b models
+New releases of Llama.cpp now support K-quantization for previously incompatible models, in particular all Falcon 7B models. This is achieved by employing a fallback solution for model layers that cannot be quantized with real K-quants.
+For Falcon 7B models, although only a quarter of the layers can be quantized with true K-quants, this approach still benefits from utilizing *different* legacy quantization types Q4_0, Q4_1, Q5_0, and Q5_1. As a result, it offers better quality at the same file size or smaller file sizes with comparable performance.
 So this solution ensures improved performance and efficiency over legacy Q4_0, Q4_1, Q5_0 and Q5_1 Quantizations.
 As previously noted on the [Llama.cpp GitHub repository](https://github.com/ggerganov/llama.cpp#hot-topics), all new Llama.cpp releases after October 18, 2023, required re-quantization due to the implementation of the new BPE tokenizer.
+This re-quantization process for Falcon Models is now complete, the latest quantized models are available here for download. To ensure continued compatibility with recent llama.cpp software, You need to update your Falcon models.
 - **Stay Informed:** Keep an eye on software application release schedules using llama.cpp libraries.
 - **Monitor Upload Times:** Re-quantization is complete. Watch for updates on my Hugging Face Model pages.
+This change only affects **Falcon** and **Starcoder** models, with other models remaining unaffected.