saikanov commited on
Commit
921ef29
1 Parent(s): f318f2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -59
README.md CHANGED
@@ -10,7 +10,7 @@ This is a GGUF quantized version of Gemma 2 9B, fine-tuned with custom instructi
10
  - **Base Model**: Gemma 2 9B
11
  - **Instruction Format**: SahabatAI Instruct v1
12
  - **Quantization**: GGUF Q4_K_M (4-bit with Medium precision for Key/Value cache)
13
- - **Original Size**: 9B parameters
14
  - **Quantized Size**: ~5GB
15
  - **Context Length**: 8192 tokens
16
  - **License**: Gemma Terms of Use
@@ -27,7 +27,7 @@ This model is a quantized version of Gemma 2 9B, fine-tuned with custom instruct
27
  ```bash
28
  git clone https://github.com/oobabooga/text-generation-webui
29
  cd text-generation-webui
30
- pip install -r requirements.txt
31
  ```
32
 
33
  2. **Download Model**:
@@ -37,24 +37,6 @@ cd models
37
  # Download gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf from Hugging Face
38
  ```
39
 
40
- 3. **Launch the Web UI**:
41
- ```bash
42
- python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf
43
- ```
44
-
45
- ### Recommended Launch Parameters
46
-
47
- For optimal performance on different hardware:
48
-
49
- **CPU Only**:
50
- ```bash
51
- python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf --cpu --n_ctx 8192
52
- ```
53
-
54
- **GPU (CUDA)**:
55
- ```bash
56
- python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf --n_ctx 8192 --gpu-memory 6
57
- ```
58
 
59
  ### Recommended Generation Parameters
60
 
@@ -66,24 +48,7 @@ repetition_penalty: 1.1
66
  max_new_tokens: 2048
67
  ```
68
 
69
- ### Instruction Format
70
 
71
- The model responds best to this instruction format:
72
- ```
73
- <|system|>You are a helpful AI assistant.</|system|>
74
-
75
- <|user|>Your question here</|user|>
76
-
77
- <|assistant|>
78
- ```
79
-
80
- ## Performance Benchmarks
81
-
82
- | Device | Tokens/sec | Memory Usage |
83
- |-----------------------|------------|--------------|
84
- | CPU (8 cores) | ~15 t/s | 6GB |
85
- | NVIDIA RTX 3060 (6GB) | ~40 t/s | 5GB |
86
- | NVIDIA RTX 4090 | ~100 t/s | 5GB |
87
 
88
  ## Example Outputs
89
 
@@ -104,38 +69,19 @@ def factorial(n):
104
  return n * factorial(n-1)
105
  ```
106
 
107
- ## Known Limitations
108
-
109
- - Requires minimum 6GB RAM for CPU inference
110
- - Best performance with GPU having 6GB+ VRAM
111
- - May show degraded performance on very long contexts (>4096 tokens)
112
- - Quantization may impact some mathematical and logical reasoning tasks
113
-
114
- ## Fine-tuning Details
115
-
116
- - Base Model: Gemma 2 9B
117
- - Instruction Format: Custom SahabatAI format
118
- - Quantization: Q4_K_M using llama.cpp
119
-
120
  ## License
121
 
122
  This model is subject to the Gemma Terms of Use. Please refer to Google's Gemma licensing terms for commercial usage.
123
 
124
  ## Acknowledgments
125
 
 
126
  - Google for the Gemma 2 base model
127
- - SahabatAI for instruction fine-tuning
128
- - TheBloke for GGUF conversion tools
129
  - oobabooga for text-generation-webui
130
 
131
  ## Support
132
 
133
  For issues and questions:
134
  - Open an issue in this repository
135
- - Visit our Discord: [Your Discord Link]
136
- - Email: [Your Support Email]
137
-
138
- ## Updates & Versions
139
-
140
- - v1.0 (2024-03): Initial release with Q4_K_M quantization
141
- - Future updates will be listed here
 
10
  - **Base Model**: Gemma 2 9B
11
  - **Instruction Format**: SahabatAI Instruct v1
12
  - **Quantization**: GGUF Q4_K_M (4-bit with Medium precision for Key/Value cache)
13
+ - **Original Size**: 18GB
14
  - **Quantized Size**: ~5GB
15
  - **Context Length**: 8192 tokens
16
  - **License**: Gemma Terms of Use
 
27
  ```bash
28
  git clone https://github.com/oobabooga/text-generation-webui
29
  cd text-generation-webui
30
+ run start-* depends on your OS
31
  ```
32
 
33
  2. **Download Model**:
 
37
  # Download gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf from Hugging Face
38
  ```
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ### Recommended Generation Parameters
42
 
 
48
  max_new_tokens: 2048
49
  ```
50
 
 
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ## Example Outputs
54
 
 
69
  return n * factorial(n-1)
70
  ```
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ## License
73
 
74
  This model is subject to the Gemma Terms of Use. Please refer to Google's Gemma licensing terms for commercial usage.
75
 
76
  ## Acknowledgments
77
 
78
+ - SahabatAI for fine-tuning the model
79
  - Google for the Gemma 2 base model
80
+ - llama.cpp for GGUF conversion tools
 
81
  - oobabooga for text-generation-webui
82
 
83
  ## Support
84
 
85
  For issues and questions:
86
  - Open an issue in this repository
87
+ - Discord: [Your Discord Link]