Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ This is a GGUF quantized version of Gemma 2 9B, fine-tuned with custom instructi
|
|
10 |
- **Base Model**: Gemma 2 9B
|
11 |
- **Instruction Format**: SahabatAI Instruct v1
|
12 |
- **Quantization**: GGUF Q4_K_M (4-bit with Medium precision for Key/Value cache)
|
13 |
-
- **Original Size**:
|
14 |
- **Quantized Size**: ~5GB
|
15 |
- **Context Length**: 8192 tokens
|
16 |
- **License**: Gemma Terms of Use
|
@@ -27,7 +27,7 @@ This model is a quantized version of Gemma 2 9B, fine-tuned with custom instruct
|
|
27 |
```bash
|
28 |
git clone https://github.com/oobabooga/text-generation-webui
|
29 |
cd text-generation-webui
|
30 |
-
|
31 |
```
|
32 |
|
33 |
2. **Download Model**:
|
@@ -37,24 +37,6 @@ cd models
|
|
37 |
# Download gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf from Hugging Face
|
38 |
```
|
39 |
|
40 |
-
3. **Launch the Web UI**:
|
41 |
-
```bash
|
42 |
-
python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf
|
43 |
-
```
|
44 |
-
|
45 |
-
### Recommended Launch Parameters
|
46 |
-
|
47 |
-
For optimal performance on different hardware:
|
48 |
-
|
49 |
-
**CPU Only**:
|
50 |
-
```bash
|
51 |
-
python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf --cpu --n_ctx 8192
|
52 |
-
```
|
53 |
-
|
54 |
-
**GPU (CUDA)**:
|
55 |
-
```bash
|
56 |
-
python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf --n_ctx 8192 --gpu-memory 6
|
57 |
-
```
|
58 |
|
59 |
### Recommended Generation Parameters
|
60 |
|
@@ -66,24 +48,7 @@ repetition_penalty: 1.1
|
|
66 |
max_new_tokens: 2048
|
67 |
```
|
68 |
|
69 |
-
### Instruction Format
|
70 |
|
71 |
-
The model responds best to this instruction format:
|
72 |
-
```
|
73 |
-
<|system|>You are a helpful AI assistant.</|system|>
|
74 |
-
|
75 |
-
<|user|>Your question here</|user|>
|
76 |
-
|
77 |
-
<|assistant|>
|
78 |
-
```
|
79 |
-
|
80 |
-
## Performance Benchmarks
|
81 |
-
|
82 |
-
| Device | Tokens/sec | Memory Usage |
|
83 |
-
|-----------------------|------------|--------------|
|
84 |
-
| CPU (8 cores) | ~15 t/s | 6GB |
|
85 |
-
| NVIDIA RTX 3060 (6GB) | ~40 t/s | 5GB |
|
86 |
-
| NVIDIA RTX 4090 | ~100 t/s | 5GB |
|
87 |
|
88 |
## Example Outputs
|
89 |
|
@@ -104,38 +69,19 @@ def factorial(n):
|
|
104 |
return n * factorial(n-1)
|
105 |
```
|
106 |
|
107 |
-
## Known Limitations
|
108 |
-
|
109 |
-
- Requires minimum 6GB RAM for CPU inference
|
110 |
-
- Best performance with GPU having 6GB+ VRAM
|
111 |
-
- May show degraded performance on very long contexts (>4096 tokens)
|
112 |
-
- Quantization may impact some mathematical and logical reasoning tasks
|
113 |
-
|
114 |
-
## Fine-tuning Details
|
115 |
-
|
116 |
-
- Base Model: Gemma 2 9B
|
117 |
-
- Instruction Format: Custom SahabatAI format
|
118 |
-
- Quantization: Q4_K_M using llama.cpp
|
119 |
-
|
120 |
## License
|
121 |
|
122 |
This model is subject to the Gemma Terms of Use. Please refer to Google's Gemma licensing terms for commercial usage.
|
123 |
|
124 |
## Acknowledgments
|
125 |
|
|
|
126 |
- Google for the Gemma 2 base model
|
127 |
-
-
|
128 |
-
- TheBloke for GGUF conversion tools
|
129 |
- oobabooga for text-generation-webui
|
130 |
|
131 |
## Support
|
132 |
|
133 |
For issues and questions:
|
134 |
- Open an issue in this repository
|
135 |
-
-
|
136 |
-
- Email: [Your Support Email]
|
137 |
-
|
138 |
-
## Updates & Versions
|
139 |
-
|
140 |
-
- v1.0 (2024-03): Initial release with Q4_K_M quantization
|
141 |
-
- Future updates will be listed here
|
|
|
10 |
- **Base Model**: Gemma 2 9B
|
11 |
- **Instruction Format**: SahabatAI Instruct v1
|
12 |
- **Quantization**: GGUF Q4_K_M (4-bit with Medium precision for Key/Value cache)
|
13 |
+
- **Original Size**: 18GB
|
14 |
- **Quantized Size**: ~5GB
|
15 |
- **Context Length**: 8192 tokens
|
16 |
- **License**: Gemma Terms of Use
|
|
|
27 |
```bash
|
28 |
git clone https://github.com/oobabooga/text-generation-webui
|
29 |
cd text-generation-webui
|
30 |
+
run start-* depends on your OS
|
31 |
```
|
32 |
|
33 |
2. **Download Model**:
|
|
|
37 |
# Download gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf from Hugging Face
|
38 |
```
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
### Recommended Generation Parameters
|
42 |
|
|
|
48 |
max_new_tokens: 2048
|
49 |
```
|
50 |
|
|
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
## Example Outputs
|
54 |
|
|
|
69 |
return n * factorial(n-1)
|
70 |
```
|
71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
## License
|
73 |
|
74 |
This model is subject to the Gemma Terms of Use. Please refer to Google's Gemma licensing terms for commercial usage.
|
75 |
|
76 |
## Acknowledgments
|
77 |
|
78 |
+
- SahabatAI for fine-tuning the model
|
79 |
- Google for the Gemma 2 base model
|
80 |
+
- llama.cpp for GGUF conversion tools
|
|
|
81 |
- oobabooga for text-generation-webui
|
82 |
|
83 |
## Support
|
84 |
|
85 |
For issues and questions:
|
86 |
- Open an issue in this repository
|
87 |
+
- Discord: [Your Discord Link]
|
|
|
|
|
|
|
|
|
|
|
|