dreamerdeo
commited on
Commit
•
6329c42
1
Parent(s):
13e9069
Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,7 @@ tags:
|
|
22 |
- sft
|
23 |
- chat
|
24 |
- instruction
|
|
|
25 |
license: apache-2.0
|
26 |
base_model: sail/Sailor-4B
|
27 |
---
|
@@ -51,7 +52,7 @@ The pre-training corpus heavily leverages the publicly available corpus, includi
|
|
51 |
[SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B),
|
52 |
[SkyPile](https://huggingface.co/datasets/Skywork/SkyPile-150B),
|
53 |
[CC100](https://huggingface.co/datasets/cc100) and [MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400).
|
54 |
-
The instruction tuning corpus are all
|
55 |
[aya_collection](https://huggingface.co/datasets/CohereForAI/aya_collection),
|
56 |
[aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset),
|
57 |
[OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca).
|
@@ -61,37 +62,80 @@ Through systematic experiments to determine the weights of different languages,
|
|
61 |
The approach boosts their performance on SEA languages while maintaining proficiency in English and Chinese without significant compromise.
|
62 |
Finally, we continually pre-train the Qwen1.5-0.5B model with 400 Billion tokens, and other models with 200 Billion tokens to obtain the Sailor models.
|
63 |
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
-
|
76 |
-
tokenizer = AutoTokenizer.from_pretrained("sail/Sailor-7B")
|
77 |
|
78 |
-
|
79 |
-
### The given Indonesian input translates to 'A language model is a probabilistic model of.'
|
80 |
|
81 |
-
|
|
|
|
|
82 |
|
83 |
-
|
84 |
-
|
85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
)
|
87 |
|
88 |
-
|
89 |
-
|
90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
-
|
93 |
-
print(response)
|
94 |
```
|
|
|
|
|
|
|
95 |
|
96 |
# License
|
97 |
|
|
|
22 |
- sft
|
23 |
- chat
|
24 |
- instruction
|
25 |
+
- gguf
|
26 |
license: apache-2.0
|
27 |
base_model: sail/Sailor-4B
|
28 |
---
|
|
|
52 |
[SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B),
|
53 |
[SkyPile](https://huggingface.co/datasets/Skywork/SkyPile-150B),
|
54 |
[CC100](https://huggingface.co/datasets/cc100) and [MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400).
|
55 |
+
The instruction tuning corpus are all publicly available including
|
56 |
[aya_collection](https://huggingface.co/datasets/CohereForAI/aya_collection),
|
57 |
[aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset),
|
58 |
[OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca).
|
|
|
62 |
The approach boosts their performance on SEA languages while maintaining proficiency in English and Chinese without significant compromise.
|
63 |
Finally, we continually pre-train the Qwen1.5-0.5B model with 400 Billion tokens, and other models with 200 Billion tokens to obtain the Sailor models.
|
64 |
|
65 |
+
### GGUF model list
|
66 |
+
| Name | Quant method | Bits | Size | Use case |
|
67 |
+
| ------------------------------------------------------------ | ------------ | ---- | ------- | ------------------------------------------------------------ |
|
68 |
+
| [ggml-model-Q2_K.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q2_K.gguf) | Q2_K | 2 | 1.62 GB | smallest, significant quality loss ❗️ not recommended for most purposes |
|
69 |
+
| [ggml-model-Q3_K_L.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q3_K_L.gguf) | Q3_K_L | 3 | 2.17 GB | small, substantial quality loss |
|
70 |
+
| [ggml-model-Q3_K_M.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q3_K_M.gguf) | Q3_K_M | 3 | 2.03 GB | very small, balanced quality |
|
71 |
+
| [ggml-model-Q3_K_S.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q3_K_S.gguf) | Q3_K_S | 3 | 1.86 GB | very small, high quality loss |
|
72 |
+
| [ggml-model-Q4_K_M.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q4_K_M.gguf) | Q4_K_M | 4 | 2.46 GB | medium, balanced quality |
|
73 |
+
| [ggml-model-Q4_K_S.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q4_K_S.gguf) | Q4_K_S | 4 | 2.34 GB | small, greater quality loss |
|
74 |
+
| [ggml-model-Q5_K_M.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q5_K_M.gguf) | Q5_K_M | 5 | 2.84 GB | large, balanced quality |
|
75 |
+
| [ggml-model-Q5_K_S.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q5_K_S.gguf) | Q5_K_S | 5 | 2.78 GB | medium, very low quality loss |
|
76 |
+
| [ggml-model-Q6_K.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q6_K.gguf) | Q6_K | 6 | 3.25 GB | very large, extremely low quality loss |
|
77 |
+
| [ggml-model-Q8_0.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-Q8_0.gguf) | Q8_0 | 8 | 4.2 GB | very large, extremely low quality loss |
|
78 |
+
| [ggml-model-f16.gguf](https://huggingface.co/sail/Sailor-4B-Chat-gguf/blob/main/ggml-model-f16.gguf) | f16 | 16 | 7.91 GB | original size, no quality loss |
|
79 |
+
|
80 |
+
### How to run with `llama.cpp`
|
81 |
+
|
82 |
+
```shell
|
83 |
+
# install llama.cpp
|
84 |
+
git clone https://github.com/ggerganov/llama.cpp.git
|
85 |
+
cd llama.cpp
|
86 |
+
make
|
87 |
+
pip install -r requirements.txt
|
88 |
+
|
89 |
+
# generate with llama.cpp
|
90 |
+
./main -ngl 40 -m ggml-model-Q4_K_M.gguf -p "<|im_start|>question\nCara memanggang ikan?\n<|im_start|>answer\n" --temp 0.7 --repeat_penalty 1.1 -n 400 -e
|
91 |
+
```
|
92 |
|
93 |
+
> Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
94 |
|
95 |
+
### How to run with `llama-cpp-python`
|
|
|
96 |
|
97 |
+
```shell
|
98 |
+
pip install llama-cpp-python
|
99 |
+
```
|
100 |
|
101 |
+
```python
|
102 |
+
import llama_cpp
|
103 |
+
import llama_cpp.llama_tokenizer
|
104 |
+
|
105 |
+
# load model
|
106 |
+
llama = llama_cpp.Llama.from_pretrained(
|
107 |
+
repo_id="sail/Sailor-4B-Chat-gguf",
|
108 |
+
filename="ggml-model-Q4_K_M.gguf",
|
109 |
+
tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained("sail/Sailor-4B-Chat"),
|
110 |
+
n_gpu_layers=40,
|
111 |
+
n_threads=8,
|
112 |
+
verbose=False,
|
113 |
)
|
114 |
|
115 |
+
system_role= 'system'
|
116 |
+
user_role = 'question'
|
117 |
+
assistant_role = "answer"
|
118 |
+
|
119 |
+
system_prompt= \
|
120 |
+
'You are an AI assistant named Sailor created by Sea AI Lab. \
|
121 |
+
Your answer should be friendly, unbiased, faithful, informative and detailed.'
|
122 |
+
system_prompt = f"<|im_start|>{system_role}\n{system_prompt}<|im_end|>"
|
123 |
+
|
124 |
+
# inference example
|
125 |
+
output = llama(
|
126 |
+
system_prompt + '\n' + f"<|im_start|>{user_role}\nCara memanggang ikan?\n<|im_start|>{assistant_role}\n",
|
127 |
+
max_tokens=256,
|
128 |
+
temperature=0.7,
|
129 |
+
top_p=0.75,
|
130 |
+
top_k=60,
|
131 |
+
stop=["<|im_end|>", "<|endoftext|>"]
|
132 |
+
)
|
133 |
|
134 |
+
print(output['choices'][0]['text'])
|
|
|
135 |
```
|
136 |
+
### How to build demo
|
137 |
+
|
138 |
+
Install `llama-cpp-python` and `gradio`, then run [script](https://github.com/sail-sg/sailor-llm/blob/main/demo/llamacpp_demo.py).
|
139 |
|
140 |
# License
|
141 |
|