Update README.md
Browse files
README.md
CHANGED
@@ -51,21 +51,21 @@ No imatrix model
|
|
51 |
This is a model quantized without using imatrix.
|
52 |
imatrixを使わずに量子化したモデルです。
|
53 |
|
54 |
-
quantizations variation M
|
55 |
This is the standard Q4_K_M model.
|
56 |
-
通常のQ4_K_Mモデルです
|
57 |
Example:
|
58 |
```llama-quantize gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M.gguf Q4_k_m```
|
59 |
|
60 |
-
quantizations variation fp16
|
61 |
Quantization method for making output and embed tensors fp16, invented by [ZeroWw](https://huggingface.co/RobertSinclair).
|
62 |
[ZeroWw](https://huggingface.co/RobertSinclair)が考案したoutputとembed tensorsをfp16にする量子化手法です
|
63 |
Example:
|
64 |
```llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M-fp16.gguf Q4_k_m```
|
65 |
|
66 |
-
quantizations variation L
|
67 |
A method often used by Bartowski for his own models, where fp16 is set to q8_0.
|
68 |
-
bartowskiが自モデルに良く使用している手法で、fp16をq8_0
|
69 |
Example:
|
70 |
```llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_L.gguf Q4_k_m```
|
71 |
|
@@ -84,8 +84,10 @@ Example:
|
|
84 |
### Considerations
|
85 |
- It seems that imatrix is effective in all cases.
|
86 |
- If you want to improve the performance of languages other than English even a little, it seems worth adding other languages. However, there is a possibility that your English ability may decrease.
|
|
|
87 |
- 全てのケースにおいてimatrixは有効であるようです
|
88 |
- 英語以外の言語の性能を少しでも向上させたい場合は他言語を追加する価値はありそうです。しかし、英語能力が下がる可能性があります。
|
|
|
89 |
|
90 |
### Other references
|
91 |
The following information may be helpful in your further exploration.
|
@@ -96,13 +98,14 @@ The following information may be helpful in your further exploration.
|
|
96 |
- [GGUFって結局どのサイズ選んだらいいの??](https://zenn.dev/yuki127/articles/e3337c176d27f2)
|
97 |
|
98 |
### Acknowledgements
|
|
|
|
|
99 |
Thanks to u/noneabove1182 for the advice and motivation.
|
100 |
アドバイスとモチベーションをくれたu/noneabove1182に感謝します
|
101 |
|
102 |
I do not know all the inventors of each method, so please point out any that I have missed.
|
103 |
各手法の考案者については私はすべてを把握できているわけではないので漏れていたら指摘してください
|
104 |
|
105 |
-
|
106 |
- **Developed by:** [dahara1@webbigdata]
|
107 |
- **Language(s) (NLP):** [English, Japanese]
|
108 |
- **Finetuned from model [optional]:** [gemma-2-9b-it]
|
@@ -115,24 +118,3 @@ I do not know all the inventors of each method, so please point out any that I h
|
|
115 |
|
116 |
[More Information Needed]
|
117 |
|
118 |
-
**APA:**
|
119 |
-
|
120 |
-
[More Information Needed]
|
121 |
-
|
122 |
-
## Glossary [optional]
|
123 |
-
|
124 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
125 |
-
|
126 |
-
[More Information Needed]
|
127 |
-
|
128 |
-
## More Information [optional]
|
129 |
-
|
130 |
-
[More Information Needed]
|
131 |
-
|
132 |
-
## Model Card Authors [optional]
|
133 |
-
|
134 |
-
[More Information Needed]
|
135 |
-
|
136 |
-
## Model Card Contact
|
137 |
-
|
138 |
-
[More Information Needed]
|
|
|
51 |
This is a model quantized without using imatrix.
|
52 |
imatrixを使わずに量子化したモデルです。
|
53 |
|
54 |
+
quantizations variation M(5.76 GB)
|
55 |
This is the standard Q4_K_M model.
|
56 |
+
通常のQ4_K_Mモデルです
|
57 |
Example:
|
58 |
```llama-quantize gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M.gguf Q4_k_m```
|
59 |
|
60 |
+
quantizations variation fp16(6.84 GB)
|
61 |
Quantization method for making output and embed tensors fp16, invented by [ZeroWw](https://huggingface.co/RobertSinclair).
|
62 |
[ZeroWw](https://huggingface.co/RobertSinclair)が考案したoutputとembed tensorsをfp16にする量子化手法です
|
63 |
Example:
|
64 |
```llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M-fp16.gguf Q4_k_m```
|
65 |
|
66 |
+
quantizations variation L(5.98 GB)
|
67 |
A method often used by Bartowski for his own models, where fp16 is set to q8_0.
|
68 |
+
bartowskiが自モデルに良く使用している手法で、fp16をq8_0にした量子化手法です
|
69 |
Example:
|
70 |
```llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_L.gguf Q4_k_m```
|
71 |
|
|
|
84 |
### Considerations
|
85 |
- It seems that imatrix is effective in all cases.
|
86 |
- If you want to improve the performance of languages other than English even a little, it seems worth adding other languages. However, there is a possibility that your English ability may decrease.
|
87 |
+
- If you are only using English, the quantization variations may not make much difference.
|
88 |
- 全てのケースにおいてimatrixは有効であるようです
|
89 |
- 英語以外の言語の性能を少しでも向上させたい場合は他言語を追加する価値はありそうです。しかし、英語能力が下がる可能性があります。
|
90 |
+
- 英語だけを使っている場合、量子化のバリエーションは大きな違いがない可能性があります
|
91 |
|
92 |
### Other references
|
93 |
The following information may be helpful in your further exploration.
|
|
|
98 |
- [GGUFって結局どのサイズ選んだらいいの??](https://zenn.dev/yuki127/articles/e3337c176d27f2)
|
99 |
|
100 |
### Acknowledgements
|
101 |
+
Thanks to the llama.cpp community.
|
102 |
+
llama.cppのコミュニティの皆さんに感謝します。
|
103 |
Thanks to u/noneabove1182 for the advice and motivation.
|
104 |
アドバイスとモチベーションをくれたu/noneabove1182に感謝します
|
105 |
|
106 |
I do not know all the inventors of each method, so please point out any that I have missed.
|
107 |
各手法の考案者については私はすべてを把握できているわけではないので漏れていたら指摘してください
|
108 |
|
|
|
109 |
- **Developed by:** [dahara1@webbigdata]
|
110 |
- **Language(s) (NLP):** [English, Japanese]
|
111 |
- **Finetuned from model [optional]:** [gemma-2-9b-it]
|
|
|
118 |
|
119 |
[More Information Needed]
|
120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|