ikaganacar commited on
Commit
8335a67
·
verified ·
1 Parent(s): 62dcb9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -145
README.md CHANGED
@@ -88,80 +88,6 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
88
  print(response)
89
  ```
90
 
91
- ### Advanced Usage with Pipeline
92
-
93
- ```python
94
- from transformers import pipeline
95
-
96
- # Create text generation pipeline
97
- pipe = pipeline(
98
- "text-generation",
99
- model="aciklab/kubernetes-ai-4bit",
100
- device_map="auto",
101
- trust_remote_code=True
102
- )
103
-
104
- # Generate response
105
- messages = [
106
- {"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın."},
107
- {"role": "user", "content": "Pod ve Deployment arasındaki fark nedir?"}
108
- ]
109
-
110
- response = pipe(
111
- messages,
112
- max_new_tokens=512,
113
- temperature=1.0,
114
- top_p=0.95,
115
- do_sample=True
116
- )
117
-
118
- print(response[0]["generated_text"][-1]["content"])
119
- ```
120
-
121
- ### Streaming Responses
122
-
123
- ```python
124
- from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
125
- from threading import Thread
126
-
127
- model_name = "aciklab/kubernetes-ai-4bit"
128
- tokenizer = AutoTokenizer.from_pretrained(model_name)
129
- model = AutoModelForCausalLM.from_pretrained(
130
- model_name,
131
- device_map="auto",
132
- trust_remote_code=True
133
- )
134
-
135
- # Prepare input
136
- prompt = "Kubernetes Service türlerini açıkla"
137
- messages = [
138
- {"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın."},
139
- {"role": "user", "content": prompt}
140
- ]
141
-
142
- input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
143
- inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
144
-
145
- # Setup streamer
146
- streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True)
147
- generation_kwargs = dict(
148
- **inputs,
149
- max_new_tokens=512,
150
- temperature=1.0,
151
- streamer=streamer
152
- )
153
-
154
- # Generate in separate thread
155
- thread = Thread(target=model.generate, kwargs=generation_kwargs)
156
- thread.start()
157
-
158
- # Stream output
159
- for text in streamer:
160
- print(text, end="", flush=True)
161
-
162
- thread.join()
163
- ```
164
-
165
  ## Training Details
166
 
167
  This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters:
@@ -205,88 +131,20 @@ This model uses 4bit quantization with BitsAndBytes for optimal memory efficienc
205
  ## Hardware Requirements
206
 
207
  ### Minimum (GPU)
208
- - **GPU:** 8GB VRAM (e.g., RTX 3060, RTX 4060)
209
  - **RAM:** 8GB system memory
210
  - **Storage:** 10GB free space
211
- - **Recommended:** CUDA-capable NVIDIA GPU
212
-
213
- ### Minimum (CPU Only)
214
- - **CPU:** 8+ cores
215
- - **RAM:** 16GB system memory
216
- - **Storage:** 10GB free space
217
- - **Note:** CPU inference will be slower than GPU
218
 
219
  ### Recommended
220
- - **GPU:** 12GB+ VRAM (e.g., RTX 3080, RTX 4070, RTX 5070)
221
  - **RAM:** 16GB system memory
222
  - **Storage:** 15GB free space
223
- - **CUDA:** 11.8 or higher
224
-
225
- ## Performance Benchmarks
226
-
227
- | Hardware | Tokens/Second | Latency (512 tokens) |
228
- |----------|---------------|----------------------|
229
- | RTX 5070 12GB | ~45-55 | ~10-12 seconds |
230
- | RTX 4060 8GB | ~35-45 | ~12-15 seconds |
231
- | CPU (16 cores) | ~5-10 | ~60-100 seconds |
232
-
233
- *Benchmarks are approximate and may vary based on system configuration*
234
 
235
- ## Inference Optimization Tips
236
 
237
- ### For Maximum Speed
238
- ```python
239
- # Use Flash Attention 2 (if available)
240
- model = AutoModelForCausalLM.from_pretrained(
241
- model_name,
242
- device_map="auto",
243
- trust_remote_code=True,
244
- attn_implementation="flash_attention_2" # Requires flash-attn package
245
- )
246
- ```
247
-
248
- ### For Lower Memory Usage
249
- ```python
250
- # Enable 8bit quantization instead of 4bit if needed
251
- from transformers import BitsAndBytesConfig
252
-
253
- quantization_config = BitsAndBytesConfig(
254
- load_in_4bit=True,
255
- bnb_4bit_compute_dtype=torch.float16,
256
- bnb_4bit_use_double_quant=True,
257
- bnb_4bit_quant_type="nf4"
258
- )
259
-
260
- model = AutoModelForCausalLM.from_pretrained(
261
- model_name,
262
- quantization_config=quantization_config,
263
- device_map="auto"
264
- )
265
- ```
266
-
267
- ## Example Queries
268
-
269
- ```python
270
- # Example 1: Creating a Deployment
271
- "Kubernetes'te 3 replikaya sahip bir nginx deployment nasıl oluştururum?"
272
-
273
- # Example 2: Service Explanation
274
- "ClusterIP, NodePort ve LoadBalancer service türleri arasındaki farklar nelerdir?"
275
-
276
- # Example 3: Troubleshooting
277
- "Pod'um CrashLoopBackOff durumunda, nasıl debug edebilirim?"
278
-
279
- # Example 4: Configuration
280
- "ConfigMap ve Secret arasındaki fark nedir ve ne zaman hangisini kullanmalıyım?"
281
-
282
- # Example 5: Best Practices
283
- "Production ortamında Kubernetes deployment için en iyi pratikler nelerdir?"
284
- ```
285
 
286
  ## Limitations
287
 
288
- - **Language:** Optimized primarily for Turkish; English queries may work but with reduced quality
289
- - **Context Window:** 1024 tokens maximum sequence length
290
  - **Domain:** Specialized for Kubernetes; may not perform well on general topics
291
  - **Quantization:** 4bit quantization may occasionally affect response quality on complex queries
292
 
 
88
  print(response)
89
  ```
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ## Training Details
92
 
93
  This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters:
 
131
  ## Hardware Requirements
132
 
133
  ### Minimum (GPU)
134
+ - **GPU:** 8GB VRAM
135
  - **RAM:** 8GB system memory
136
  - **Storage:** 10GB free space
 
 
 
 
 
 
 
137
 
138
  ### Recommended
139
+ - **GPU:** 12GB+ VRAM
140
  - **RAM:** 16GB system memory
141
  - **Storage:** 15GB free space
 
 
 
 
 
 
 
 
 
 
 
142
 
 
143
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
 
145
  ## Limitations
146
 
147
+ - **Language:** Optimized primarily for Turkish and English.
 
148
  - **Domain:** Specialized for Kubernetes; may not perform well on general topics
149
  - **Quantization:** 4bit quantization may occasionally affect response quality on complex queries
150