dahara1
/

imatrix-jpn-test

GGUF

Inference Endpoints

imatrix

conversational

Model card Files Files and versions Community

dahara1 commited on Sep 23

Commit

896a3bd

•

1 Parent(s): 2613fbc

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -27

README.md CHANGED Viewed

@@ -51,21 +51,21 @@ No imatrix model
 This is a model quantized without using imatrix.
 imatrixを使わずに量子化したモデルです。
-quantizations variation M
 This is the standard Q4_K_M model.
-通常のQ4_K_Mモデルです
 Example:
 ```llama-quantize gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M.gguf Q4_k_m```
-quantizations variation fp16
 Quantization method for making output and embed tensors fp16, invented by [ZeroWw](https://huggingface.co/RobertSinclair).
 [ZeroWw](https://huggingface.co/RobertSinclair)が考案したoutputとembed tensorsをfp16にする量子化手法です
 Example:
 ```llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M-fp16.gguf Q4_k_m```
-quantizations variation L
 A method often used by Bartowski for his own models, where fp16 is set to q8_0.
-bartowskiが自モデルに良く使用している手法で、fp16をq8_0にした量子化手法です。
 Example:
 ```llama-quantize --allow-requantize  --output-tensor-type q8_0 --token-embedding-type q8_0 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_L.gguf Q4_k_m```
@@ -84,8 +84,10 @@ Example:
 ### Considerations
 - It seems that imatrix is effective in all cases.
 - If you want to improve the performance of languages other than English even a little, it seems worth adding other languages. However, there is a possibility that your English ability may decrease.
 - 全てのケースにおいてimatrixは有効であるようです
 - 英語以外の言語の性能を少しでも向上させたい場合は他言語を追加する価値はありそうです。しかし、英語能力が下がる可能性があります。
 ### Other references
 The following information may be helpful in your further exploration.
@@ -96,13 +98,14 @@ The following information may be helpful in your further exploration.
 - [GGUFって結局どのサイズ選んだらいいの？？](https://zenn.dev/yuki127/articles/e3337c176d27f2)
 ### Acknowledgements
 Thanks to u/noneabove1182 for the advice and motivation.
 アドバイスとモチベーションをくれたu/noneabove1182に感謝します
 I do not know all the inventors of each method, so please point out any that I have missed.
 各手法の考案者については私はすべてを把握できているわけではないので漏れていたら指摘してください
 - **Developed by:** [dahara1@webbigdata]
 - **Language(s) (NLP):** [English, Japanese]
 - **Finetuned from model [optional]:** [gemma-2-9b-it]
@@ -115,24 +118,3 @@ I do not know all the inventors of each method, so please point out any that I h
 [More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 This is a model quantized without using imatrix.
 imatrixを使わずに量子化したモデルです。
+quantizations variation M(5.76 GB)
 This is the standard Q4_K_M model.
+通常のQ4_K_Mモデルです
 Example:
 ```llama-quantize gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M.gguf Q4_k_m```
+quantizations variation fp16(6.84 GB)
 Quantization method for making output and embed tensors fp16, invented by [ZeroWw](https://huggingface.co/RobertSinclair).
 [ZeroWw](https://huggingface.co/RobertSinclair)が考案したoutputとembed tensorsをfp16にする量子化手法です
 Example:
 ```llama-quantize --allow-requantize --output-tensor-type f16 --token-embedding-type f16 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_M-fp16.gguf Q4_k_m```
+quantizations variation L(5.98 GB)
 A method often used by Bartowski for his own models, where fp16 is set to q8_0.
+bartowskiが自モデルに良く使用している手法で、fp16をq8_0にした量子化手法です
 Example:
 ```llama-quantize --allow-requantize  --output-tensor-type q8_0 --token-embedding-type q8_0 --imatrix imatrix.dat gemma-2-9B-it-BF16.gguf gemma-2-9b-it-Q4_K_L.gguf Q4_k_m```
 ### Considerations
 - It seems that imatrix is effective in all cases.
 - If you want to improve the performance of languages other than English even a little, it seems worth adding other languages. However, there is a possibility that your English ability may decrease.
+- If you are only using English, the quantization variations may not make much difference.
 - 全てのケースにおいてimatrixは有効であるようです
 - 英語以外の言語の性能を少しでも向上させたい場合は他言語を追加する価値はありそうです。しかし、英語能力が下がる可能性があります。
+- 英語だけを使っている場合、量子化のバリエーションは大きな違いがない可能性があります
 ### Other references
 The following information may be helpful in your further exploration.
 - [GGUFって結局どのサイズ選んだらいいの？？](https://zenn.dev/yuki127/articles/e3337c176d27f2)
 ### Acknowledgements
+Thanks to the llama.cpp community.
+llama.cppのコミュニティの皆さんに感謝します。
 Thanks to u/noneabove1182 for the advice and motivation.
 アドバイスとモチベーションをくれたu/noneabove1182に感謝します
 I do not know all the inventors of each method, so please point out any that I have missed.
 各手法の考案者については私はすべてを把握できているわけではないので漏れていたら指摘してください
 - **Developed by:** [dahara1@webbigdata]
 - **Language(s) (NLP):** [English, Japanese]
 - **Finetuned from model [optional]:** [gemma-2-9b-it]
 [More Information Needed]