saikanov
/

gemma2-9B-cpt-sahabatai-instruct-v1-Q4-K-M-GGUF

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

saikanov commited on Nov 16, 2024

Commit

921ef29

•

1 Parent(s): f318f2a

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -59

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ This is a GGUF quantized version of Gemma 2 9B, fine-tuned with custom instructi
 - **Base Model**: Gemma 2 9B
 - **Instruction Format**: SahabatAI Instruct v1
 - **Quantization**: GGUF Q4_K_M (4-bit with Medium precision for Key/Value cache)
-- **Original Size**: 9B parameters
 - **Quantized Size**: ~5GB
 - **Context Length**: 8192 tokens
 - **License**: Gemma Terms of Use
@@ -27,7 +27,7 @@ This model is a quantized version of Gemma 2 9B, fine-tuned with custom instruct
 ```bash
 git clone https://github.com/oobabooga/text-generation-webui
 cd text-generation-webui
-pip install -r requirements.txt
 ```
 2. **Download Model**:
@@ -37,24 +37,6 @@ cd models
 # Download gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf from Hugging Face
 ```
-3. **Launch the Web UI**:
-```bash
-python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf
-```
-### Recommended Launch Parameters
-For optimal performance on different hardware:
-**CPU Only**:
-```bash
-python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf --cpu --n_ctx 8192
-```
-**GPU (CUDA)**:
-```bash
-python server.py --model gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf --n_ctx 8192 --gpu-memory 6
-```
 ### Recommended Generation Parameters
@@ -66,24 +48,7 @@ repetition_penalty: 1.1
 max_new_tokens: 2048
 ```
-### Instruction Format
-The model responds best to this instruction format:
-```
-<|system|>You are a helpful AI assistant.</|system|>
-<|user|>Your question here</|user|>
-<|assistant|>
-```
-## Performance Benchmarks
-| Device                 | Tokens/sec | Memory Usage |
-|-----------------------|------------|--------------|
-| CPU (8 cores)         | ~15 t/s    | 6GB         |
-| NVIDIA RTX 3060 (6GB) | ~40 t/s    | 5GB         |
-| NVIDIA RTX 4090       | ~100 t/s   | 5GB         |
 ## Example Outputs
@@ -104,38 +69,19 @@ def factorial(n):
     return n * factorial(n-1)
 ```
-## Known Limitations
-- Requires minimum 6GB RAM for CPU inference
-- Best performance with GPU having 6GB+ VRAM
-- May show degraded performance on very long contexts (>4096 tokens)
-- Quantization may impact some mathematical and logical reasoning tasks
-## Fine-tuning Details
-- Base Model: Gemma 2 9B
-- Instruction Format: Custom SahabatAI format
-- Quantization: Q4_K_M using llama.cpp
 ## License
 This model is subject to the Gemma Terms of Use. Please refer to Google's Gemma licensing terms for commercial usage.
 ## Acknowledgments
 - Google for the Gemma 2 base model
-- SahabatAI for instruction fine-tuning
-- TheBloke for GGUF conversion tools
 - oobabooga for text-generation-webui
 ## Support
 For issues and questions:
 - Open an issue in this repository
-- Visit our Discord: [Your Discord Link]
-- Email: [Your Support Email]
-## Updates & Versions
-- v1.0 (2024-03): Initial release with Q4_K_M quantization
-- Future updates will be listed here

 - **Base Model**: Gemma 2 9B
 - **Instruction Format**: SahabatAI Instruct v1
 - **Quantization**: GGUF Q4_K_M (4-bit with Medium precision for Key/Value cache)
+- **Original Size**: 18GB
 - **Quantized Size**: ~5GB
 - **Context Length**: 8192 tokens
 - **License**: Gemma Terms of Use
 ```bash
 git clone https://github.com/oobabooga/text-generation-webui
 cd text-generation-webui
+run start-* depends on your OS
 ```
 2. **Download Model**:
 # Download gemma2-9B-cpt-sahabatai-instruct-v1-Q4_K_M.gguf from Hugging Face
 ```
 ### Recommended Generation Parameters
 max_new_tokens: 2048
 ```
 ## Example Outputs
     return n * factorial(n-1)
 ```
 ## License
 This model is subject to the Gemma Terms of Use. Please refer to Google's Gemma licensing terms for commercial usage.
 ## Acknowledgments
+- SahabatAI for fine-tuning the model
 - Google for the Gemma 2 base model
+- llama.cpp for GGUF conversion tools
 - oobabooga for text-generation-webui
 ## Support
 For issues and questions:
 - Open an issue in this repository
+- Discord: [Your Discord Link]