Instructions to use micymike/codemate-qwen-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use micymike/codemate-qwen-1.5B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="micymike/codemate-qwen-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("micymike/codemate-qwen-1.5B") model = AutoModelForMultimodalLM.from_pretrained("micymike/codemate-qwen-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use micymike/codemate-qwen-1.5B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "micymike/codemate-qwen-1.5B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "micymike/codemate-qwen-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/micymike/codemate-qwen-1.5B
- SGLang
How to use micymike/codemate-qwen-1.5B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "micymike/codemate-qwen-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "micymike/codemate-qwen-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "micymike/codemate-qwen-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "micymike/codemate-qwen-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use micymike/codemate-qwen-1.5B with Docker Model Runner:
docker model run hf.co/micymike/codemate-qwen-1.5B
CodeMate-Qwen
Model Details
Model Description
CodeMate-Qwen is a coding-focused language model fine-tuned from Qwen2.5-Coder-1.5B using Low-Rank Adaptation (LoRA). The model is designed to assist developers with code generation, debugging, code explanation, refactoring, and software engineering tasks.
The project was created to explore parameter-efficient fine-tuning techniques and build a lightweight coding assistant capable of supporting real-world development workflows.
Developed by
Michael Moses
Funded by
Self-funded personal research project.
Shared by
Michael Moses
Model Type
Causal Language Model (LLM) for Code Generation and Software Engineering Assistance.
Language(s)
English
Programming Languages:
- Python
- JavaScript
- TypeScript
- HTML
- CSS
- SQL
- General programming concepts
License
Apache 2.0 (subject to the licensing terms of the base Qwen model).
Finetuned From
Qwen/Qwen2.5-Coder-1.5B
Model Sources
Repository
GitHub: https://github.com/micymike
Hugging Face
https://huggingface.co/micymike
Demo
Coming Soon
Uses
Direct Use
This model is intended for:
- Code generation
- Debugging assistance
- Programming education
- Code explanation
- Refactoring recommendations
- Developer productivity workflows
- AI-assisted software development
Downstream Use
Potential downstream applications include:
- Coding copilots
- Educational coding assistants
- Automated code review systems
- Software engineering support tools
- Programming tutors
Out-of-Scope Use
This model is not intended for:
- Legal advice
- Medical advice
- Financial decision-making
- Safety-critical systems
- Autonomous code deployment without human review
Generated code should always be reviewed and tested before production use.
Bias, Risks, and Limitations
Like all large language models, CodeMate-Qwen may:
- Generate incorrect code
- Produce insecure implementations
- Hallucinate APIs or libraries
- Miss edge cases
- Reflect biases present in training data
Users should validate all generated outputs before deployment.
Recommendations
The model performs best when:
- Prompts are clear and specific
- Sufficient context is provided
- Outputs are reviewed by a developer
The model should be considered an assistant rather than a replacement for software engineering expertise.
How to Get Started
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "micymike/codemate-qwen-merged"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto"
)
prompt = "Write a Python function that checks if a number is prime."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=256
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
The training dataset consisted of instruction-response pairs focused on software engineering and programming-related tasks.
Examples included:
- Bug fixing
- Code generation
- Code explanation
- Refactoring
- Programming Q&A
- Developer workflow assistance
Training Procedure
The model was fine-tuned using LoRA (Low-Rank Adaptation), allowing efficient adaptation of the base model while training only a small subset of parameters.
Training Regime
- Base Model: Qwen2.5-Coder-1.5B
- Fine-Tuning Method: LoRA
- Framework: Hugging Face Transformers
- PEFT Library: PEFT
- Backend: PyTorch
Evaluation
Testing Data
Evaluation was performed using programming-related prompts covering:
- Python debugging
- Code generation
- Code explanation
- Refactoring tasks
Metrics
Evaluation focused primarily on qualitative assessment:
- Instruction-following capability
- Code correctness
- Response quality
- Programming relevance
Results
The model demonstrated improved performance on coding-focused tasks compared to the untuned base model and showed stronger alignment with software engineering workflows.
Environmental Impact
Hardware Type
NVIDIA GPU
Cloud Provider
Google Colab
Compute Region
Not specified
Carbon Emitted
Not measured
Technical Specifications
Model Architecture
Transformer-based autoregressive language model.
Base Architecture
Qwen2.5-Coder-1.5B
Objective
Next-token prediction optimized for coding and software engineering tasks.
Compute Infrastructure
Hardware
Google Colab GPU Environment
Software
- Python
- PyTorch
- Transformers
- PEFT
- Hugging Face Hub
Citation
@misc{moses2026codemateqwen,
author = {Michael Moses},
title = {CodeMate-Qwen: A LoRA Fine-Tuned Coding Assistant Based on Qwen2.5-Coder-1.5B},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/micymike}
}
Model Card Authors
Michael Moses
Contact
GitHub: https://github.com/micymike
Email: mosesmichael878@gmail.com
Future Work
Planned improvements include:
- Larger instruction datasets
- Quantized deployments
- Benchmark evaluation on HumanEval and MBPP
- Additional programming language support
- Interactive web demo
- Advanced code review capabilities
- Downloads last month
- 68