dahara1 commited on
Commit
896a3bd
1 Parent(s): 2613fbc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -27
README.md CHANGED
@@ -51,21 +51,21 @@ No imatrix model
51
  This is a model quantized without using imatrix.
52
  imatrixを使わずに量子化したモデルです。
53
 
54
- quantizations variation M
55
  This is the standard Q4_K_M model.
56
- 通常のQ4_K_Mモデルです
57
  Example:
58
  ```llama-quantize gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M.gguf Q4_k_m```
59
 
60
- quantizations variation fp16
61
  Quantization method for making output and embed tensors fp16, invented by [ZeroWw](https://huggingface.co/RobertSinclair).
62
  [ZeroWw](https://huggingface.co/RobertSinclair)が考案したoutputとembed tensorsをfp16にする量子化手法です
63
  Example:
64
  ```llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M-fp16.gguf Q4_k_m```
65
 
66
- quantizations variation L
67
  A method often used by Bartowski for his own models, where fp16 is set to q8_0.
68
- bartowskiが自モデルに良く使用している手法で、fp16をq8_0にした量子化手法です。
69
  Example:
70
  ```llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_L.gguf Q4_k_m```
71
 
@@ -84,8 +84,10 @@ Example:
84
  ### Considerations
85
  - It seems that imatrix is effective in all cases.
86
  - If you want to improve the performance of languages other than English even a little, it seems worth adding other languages. However, there is a possibility that your English ability may decrease.
 
87
  - 全てのケースにおいてimatrixは有効であるようです
88
  - 英語以外の言語の性能を少しでも向上させたい場合は他言語を追加する価値はありそうです。しかし、英語能力が下がる可能性があります。
 
89
 
90
  ### Other references
91
  The following information may be helpful in your further exploration.
@@ -96,13 +98,14 @@ The following information may be helpful in your further exploration.
96
  - [GGUFって結局どのサイズ選んだらいいの??](https://zenn.dev/yuki127/articles/e3337c176d27f2)
97
 
98
  ### Acknowledgements
 
 
99
  Thanks to u/noneabove1182 for the advice and motivation.
100
  アドバイスとモチベーションをくれたu/noneabove1182に感謝します
101
 
102
  I do not know all the inventors of each method, so please point out any that I have missed.
103
  各手法の考案者については私はすべてを把握できているわけではないので漏れていたら指摘してください
104
 
105
-
106
  - **Developed by:** [dahara1@webbigdata]
107
  - **Language(s) (NLP):** [English, Japanese]
108
  - **Finetuned from model [optional]:** [gemma-2-9b-it]
@@ -115,24 +118,3 @@ I do not know all the inventors of each method, so please point out any that I h
115
 
116
  [More Information Needed]
117
 
118
- **APA:**
119
-
120
- [More Information Needed]
121
-
122
- ## Glossary [optional]
123
-
124
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
125
-
126
- [More Information Needed]
127
-
128
- ## More Information [optional]
129
-
130
- [More Information Needed]
131
-
132
- ## Model Card Authors [optional]
133
-
134
- [More Information Needed]
135
-
136
- ## Model Card Contact
137
-
138
- [More Information Needed]
 
51
  This is a model quantized without using imatrix.
52
  imatrixを使わずに量子化したモデルです。
53
 
54
+ quantizations variation M(5.76 GB)
55
  This is the standard Q4_K_M model.
56
+ 通常のQ4_K_Mモデルです
57
  Example:
58
  ```llama-quantize gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M.gguf Q4_k_m```
59
 
60
+ quantizations variation fp16(6.84 GB)
61
  Quantization method for making output and embed tensors fp16, invented by [ZeroWw](https://huggingface.co/RobertSinclair).
62
  [ZeroWw](https://huggingface.co/RobertSinclair)が考案したoutputとembed tensorsをfp16にする量子化手法です
63
  Example:
64
  ```llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M-fp16.gguf Q4_k_m```
65
 
66
+ quantizations variation L(5.98 GB)
67
  A method often used by Bartowski for his own models, where fp16 is set to q8_0.
68
+ bartowskiが自モデルに良く使用している手法で、fp16をq8_0にした量子化手法です
69
  Example:
70
  ```llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_L.gguf Q4_k_m```
71
 
 
84
  ### Considerations
85
  - It seems that imatrix is effective in all cases.
86
  - If you want to improve the performance of languages other than English even a little, it seems worth adding other languages. However, there is a possibility that your English ability may decrease.
87
+ - If you are only using English, the quantization variations may not make much difference.
88
  - 全てのケースにおいてimatrixは有効であるようです
89
  - 英語以外の言語の性能を少しでも向上させたい場合は他言語を追加する価値はありそうです。しかし、英語能力が下がる可能性があります。
90
+ - 英語だけを使っている場合、量子化のバリエーションは大きな違いがない可能性があります
91
 
92
  ### Other references
93
  The following information may be helpful in your further exploration.
 
98
  - [GGUFって結局どのサイズ選んだらいいの??](https://zenn.dev/yuki127/articles/e3337c176d27f2)
99
 
100
  ### Acknowledgements
101
+ Thanks to the llama.cpp community.  
102
+ llama.cppのコミュニティの皆さんに感謝します。
103
  Thanks to u/noneabove1182 for the advice and motivation.
104
  アドバイスとモチベーションをくれたu/noneabove1182に感謝します
105
 
106
  I do not know all the inventors of each method, so please point out any that I have missed.
107
  各手法の考案者については私はすべてを把握できているわけではないので漏れていたら指摘してください
108
 
 
109
  - **Developed by:** [dahara1@webbigdata]
110
  - **Language(s) (NLP):** [English, Japanese]
111
  - **Finetuned from model [optional]:** [gemma-2-9b-it]
 
118
 
119
  [More Information Needed]
120