GGUF
Composer
MosaicML
llm-foundry
StreamingDatasets
6 papers
maddes8cht commited on
Commit
e2de4b2
1 Parent(s): 1e68722

"Update README.md"

Browse files
Files changed (1) hide show
  1. README.md +7 -19
README.md CHANGED
@@ -21,21 +21,7 @@ I'm constantly enhancing these model descriptions to provide you with the most r
21
  - Model creator: [mosaicml](https://huggingface.co/mosaicml)
22
  - Original model: [mpt-7b-8k](https://huggingface.co/mosaicml/mpt-7b-8k)
23
 
24
- # Important Update for Falcon Models in llama.cpp Versions After October 18, 2023
25
-
26
- As noted on the [Llama.cpp GitHub repository](https://github.com/ggerganov/llama.cpp#hot-topics), all new Llama.cpp releases after October 18, 2023, will require a re-quantization due to the new BPE tokenizer.
27
-
28
- **Good news!** I am glad that my re-quantization process for Falcon Models is nearly complete. Download the latest quantized models to ensure compatibility with recent llama.cpp software.
29
-
30
- **Key Points:**
31
-
32
- - **Stay Informed:** Keep an eye on software application release schedules using llama.cpp libraries.
33
- - **Monitor Upload Times:** Re-quantization is *almost* done. Watch for updates on my Hugging Face Model pages.
34
-
35
- **Important Compatibility Note:** Old software will work with old Falcon models, but expect updated software to exclusively support the new models.
36
-
37
- This change primarily affects **Falcon** and **Starcoder** models, with other models remaining unaffected.
38
-
39
 
40
 
41
 
@@ -47,19 +33,21 @@ The core project making use of the ggml library is the [llama.cpp](https://githu
47
 
48
  # Quantization variants
49
 
50
- There is a bunch of quantized files available. How to choose the best for you:
51
 
52
  # Legacy quants
53
 
54
  Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
55
  Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
56
- Falcon 7B models cannot be quantized to K-quants.
 
 
57
 
58
  # K-quants
59
 
60
- K-quants are based on the idea that the quantization of certain parts affects the quality in different ways. If you quantize certain parts more and others less, you get a more powerful model with the same file size, or a smaller file size and lower memory load with comparable performance.
61
  So, if possible, use K-quants.
62
- With a Q6_K you should find it really hard to find a quality difference to the original model - ask your model two times the same question and you may encounter bigger quality differences.
63
 
64
 
65
 
 
21
  - Model creator: [mosaicml](https://huggingface.co/mosaicml)
22
  - Original model: [mpt-7b-8k](https://huggingface.co/mosaicml/mpt-7b-8k)
23
 
24
+ MPT-7b and MPT-30B are part of the family of Mosaic Pretrained Transformer (MPT) models, which use a modified transformer architecture optimized for efficient training and inference.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
 
27
 
 
33
 
34
  # Quantization variants
35
 
36
+ There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
37
 
38
  # Legacy quants
39
 
40
  Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
41
  Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
42
+ ## Note:
43
+ Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
44
+ (This mainly refers to Falcon 7b and Starcoder models)
45
 
46
  # K-quants
47
 
48
+ K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
49
  So, if possible, use K-quants.
50
+ With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.
51
 
52
 
53