sail
/

Sailor-7B-Chat-gguf

Inference Endpoints

Model card Files Files and versions Community

chaoscodes commited on Mar 9

Commit

13fd76c

•

1 Parent(s): bb4ec00

Update README.md

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -61,6 +61,21 @@ Through systematic experiments to determine the weights of different languages,
 The approach boosts their performance on SEA languages while maintaining proficiency in English and Chinese without significant compromise.
 Finally, we continually pre-train the Qwen1.5-0.5B model with 400 Billion tokens, and other models with 200 Billion tokens to obtain the Sailor models.
 ### How to run with `llama.cpp`
 ```shell

 The approach boosts their performance on SEA languages while maintaining proficiency in English and Chinese without significant compromise.
 Finally, we continually pre-train the Qwen1.5-0.5B model with 400 Billion tokens, and other models with 200 Billion tokens to obtain the Sailor models.
+### GGUF model list
+| Name                                                         | Quant method | Bits | Size     | Use case                               |
+| ------------------------------------------------------------ | ------------ | ---- | -------- | -------------------------------------- |
+| [ggml-model-Q2_K.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q2_K.gguf) | Q2_K         | 2    | 3.10 GB  | medium, significant quality loss       |
+| [ggml-model-Q3_K_L.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q3_K_L.gguf) | Q3_K_L       | 3    | 4.22 GB  | large, substantial quality loss        |
+| [ggml-model-Q3_K_M.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q3_K_M.gguf) | Q3_K_M       | 3    | 3.92 GB  | medium, balanced quality               |
+| [ggml-model-Q3_K_S.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q3_K_S.gguf) | Q3_K_S       | 3    | 3.57 GB  | medium, high quality loss              |
+| [ggml-model-Q4_K_M.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q4_K_M.gguf) | Q4_K_M       | 4    | 4.77 GB  | large, balanced quality                |
+| [ggml-model-Q4_K_S.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q4_K_S.gguf) | Q4_K_S       | 4    | 4.54 GB  | large, greater quality loss            |
+| [ggml-model-Q5_K_M.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q5_K_M.gguf) | Q5_K_M       | 5    | 5.53 GB  | large, balanced quality                |
+| [ggml-model-Q5_K_S.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q5_K_S.gguf) | Q5_K_S       | 5    | 5.4 GB   | large, very low quality loss           |
+| [ggml-model-Q6_K.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q6_K.gguf) | Q6_K         | 6    | 6.34 GB  | large, extremely low quality loss      |
+| [ggml-model-Q8_0.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-Q8_0.gguf) | Q8_0         | 8    | 8.21 GB  | very large, extremely low quality loss |
+| [ggml-model-f16.gguf](https://huggingface.co/sail/Sailor-7B-Chat-gguf/blob/main/ggml-model-f16.gguf) | f16          | 16   | 15.40 GB | very large, no quality loss            |
 ### How to run with `llama.cpp`
 ```shell