YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
King-Kode-128K
An advanced AI-powered coding assistant model based on DeepSeek Coder 33B Instruct with extended capabilities, larger token size, and specialized debugging tokens.
** Model Summary**
- Base Model: DeepSeek-Coder-33B-Instruct
- Model Type: Causal Language Model (AutoRegressive Transformer)
- Architecture: Transformer with Flash Attention 2
- Context Length: 128K tokens (extended from 16K)
- Training Paradigm: Pretrained on diverse programming languages, optimized for debugging, refactoring, and performance analysis.
- Inference: Optimized for CUDA 12.2 with FlashAttention for faster computations.
** Key Features**
- ** Extended RoPE (128K Context Length)**: Increased from 16K → 128K using rotary position embeddings (RoPE).
- ** AI Debugging & Code Optimization**: Includes specialized tokens for debugging, refactoring, and AI-assisted improvements.
- ** Token Expansion**: Integrated 21+ custom instruction tokens for better AI-guided development.
- ** Efficient Execution**: Supports FlashAttention-2, 4-bit quantization, and optimized CUDA inference.
🛠️ Model Details
Parameter | Value |
---|---|
Base Model | DeepSeek-Coder-33B-Instruct |
Token Size | 128,000 (128K) |
RoPE Scaling | 8x Dynamic Scaling |
Precision | FP16 & 4-bit |
Flash Attention | FlashAttention-2 Enabled |
Tokenizer | DeepSeek 33B Tokenizer with custom tokens |
Architecture | Transformer |
Layers | 80 |
Hidden Size | 7168 |
Attention Heads | 56 |
Training Data | Python, JavaScript, C++, Java, Rust, SQL, Bash, HTML, JSON, YAML, etc. |
Batch Size | Optimized for large-scale inference |
Custom Instruction Tokens
King-Kode-128K introduces 21+ custom tokens for AI-powered debugging, code refactoring, and performance optimization.
Token | Purpose |
---|---|
<DEBUG> |
Debug the code and detect issues. |
<FIX_CODE> |
Automatically fix errors in the script. |
<AUTOCOMPLETE> |
Provide AI-powered code completion. |
<AUTO_DEBUG> |
Perform automatic debugging with AI. |
<DEBUG_SCRIPT> |
Debug the entire script at once. |
<GENERATE> |
Generate code snippets based on user instructions. |
<EXPLAIN_ERROR> |
Explain errors in simple terms. |
<SUGGEST_FIX> |
Provide AI-powered fixes for detected issues. |
<CODE_REFACTOR> |
Suggest and apply better coding practices. |
<OPTIMIZE_PERFORMANCE> |
Identify bottlenecks and optimize code. |
<FORMAT_CODE> |
Format the code to follow best practices. |
<REWRITE_CODE> |
Rewrite existing code for clarity or efficiency. |
<CODE_SUMMARY> |
Summarize what the code does. |
<AUTO_COMMENT> |
Add meaningful comments to the code. |
<RUN_TESTS> |
AI-assisted test execution. |
<ANALYZE_COMPLEXITY> |
Analyze algorithm complexity (Big O analysis). |
<FIX_BUG> |
Automatically identify and fix bugs. |
<CHECK_SYNTAX> |
Check syntax errors in real-time. |
<AI_ASSIST> |
General AI assistant command for debugging & coding. |
<INTELLIGENT_COMPLETION> |
AI-driven code completion for large-scale scripts. |
How to Use
Load Model in Python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "KingyJr/King-Kode-128K"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Load model with FlashAttention-2 optimization
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
print("Model and tokenizer loaded successfully!")
Example Usage (AI Debugging)
prompt = "<DEBUG> Fix the following Python code:
print(Hello World)"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Model Files
File | Description |
---|---|
config.json |
Model configuration. |
generation_config.json |
Generation settings. |
model.safetensors.index.json |
Index of model shards. |
model-00001-of-00007.safetensors - model-00007-of-00007.safetensors |
Model weights. |
special_tokens_map.json |
Special token mapping. |
tokenizer.json |
Tokenizer settings. |
tokenizer_config.json |
Tokenizer configuration. |
Storage & Deployment
- Inference Optimizations: FlashAttention, quantization (4-bit & 8-bit), tensor parallelism.
- Fine-tuning & RLHF: Extendable for custom fine-tuning.
License & Usage
- License: Apache 2.0
- Intended Use: AI-assisted debugging, software development, and optimization.
- Restrictions: Not for commercial resale without modification.
Citation
If you use King-Kode-128K, please cite:
@article{KingKode128K,
author = {KingyJr},
title = {King-Kode-128K: AI Debugging and Code Assistant},
journal = {Hugging Face Models},
year = {2025},
url = {https://huggingface.co/KingyJr/King-Kode-128K}
}
Contributors
- KingyJr - Lead Developer
- DeepSeek-AI - Base Model Contributors
Contact & Support
For issues, suggestions, or contributions, please open a GitHub issue or contact KingyJr.
This model is designed to enhance software development with AI-driven debugging, optimization, and code generation. Enjoy coding!
- Downloads last month
- 13
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.