Instructions to use micymike/codemate-qwen-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use micymike/codemate-qwen-1.5B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="micymike/codemate-qwen-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("micymike/codemate-qwen-1.5B")
model = AutoModelForMultimodalLM.from_pretrained("micymike/codemate-qwen-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use micymike/codemate-qwen-1.5B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "micymike/codemate-qwen-1.5B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "micymike/codemate-qwen-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/micymike/codemate-qwen-1.5B

SGLang

How to use micymike/codemate-qwen-1.5B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "micymike/codemate-qwen-1.5B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "micymike/codemate-qwen-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "micymike/codemate-qwen-1.5B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "micymike/codemate-qwen-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use micymike/codemate-qwen-1.5B with Docker Model Runner:
```
docker model run hf.co/micymike/codemate-qwen-1.5B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

CodeMate-Qwen

Model Details

Model Description

CodeMate-Qwen is a coding-focused language model fine-tuned from Qwen2.5-Coder-1.5B using Low-Rank Adaptation (LoRA). The model is designed to assist developers with code generation, debugging, code explanation, refactoring, and software engineering tasks.

The project was created to explore parameter-efficient fine-tuning techniques and build a lightweight coding assistant capable of supporting real-world development workflows.

Developed by

Michael Moses

Funded by

Self-funded personal research project.

Shared by

Michael Moses

Model Type

Causal Language Model (LLM) for Code Generation and Software Engineering Assistance.

Language(s)

English
Programming Languages:
- Python
- JavaScript
- TypeScript
- HTML
- CSS
- SQL
- General programming concepts

License

Apache 2.0 (subject to the licensing terms of the base Qwen model).

Finetuned From

Qwen/Qwen2.5-Coder-1.5B

Model Sources

Repository

GitHub: https://github.com/micymike

Hugging Face

https://huggingface.co/micymike

Demo

Coming Soon

Uses

Direct Use

This model is intended for:

Code generation
Debugging assistance
Programming education
Code explanation
Refactoring recommendations
Developer productivity workflows
AI-assisted software development

Downstream Use

Potential downstream applications include:

Coding copilots
Educational coding assistants
Automated code review systems
Software engineering support tools
Programming tutors

Out-of-Scope Use

This model is not intended for:

Legal advice
Medical advice
Financial decision-making
Safety-critical systems
Autonomous code deployment without human review

Generated code should always be reviewed and tested before production use.

Bias, Risks, and Limitations

Like all large language models, CodeMate-Qwen may:

Generate incorrect code
Produce insecure implementations
Hallucinate APIs or libraries
Miss edge cases
Reflect biases present in training data

Users should validate all generated outputs before deployment.

Recommendations

The model performs best when:

Prompts are clear and specific
Sufficient context is provided
Outputs are reviewed by a developer

The model should be considered an assistant rather than a replacement for software engineering expertise.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "micymike/codemate-qwen-merged"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto"
)

prompt = "Write a Python function that checks if a number is prime."

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=256
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The training dataset consisted of instruction-response pairs focused on software engineering and programming-related tasks.

Examples included:

Bug fixing
Code generation
Code explanation
Refactoring
Programming Q&A
Developer workflow assistance

Training Procedure

The model was fine-tuned using LoRA (Low-Rank Adaptation), allowing efficient adaptation of the base model while training only a small subset of parameters.

Training Regime

Base Model: Qwen2.5-Coder-1.5B
Fine-Tuning Method: LoRA
Framework: Hugging Face Transformers
PEFT Library: PEFT
Backend: PyTorch

Evaluation

Testing Data

Evaluation was performed using programming-related prompts covering:

Python debugging
Code generation
Code explanation
Refactoring tasks

Metrics

Evaluation focused primarily on qualitative assessment:

Instruction-following capability
Code correctness
Response quality
Programming relevance

Results

The model demonstrated improved performance on coding-focused tasks compared to the untuned base model and showed stronger alignment with software engineering workflows.

Environmental Impact

Hardware Type

NVIDIA GPU

Cloud Provider

Google Colab

Compute Region

Not specified

Carbon Emitted

Not measured

Technical Specifications

Model Architecture

Transformer-based autoregressive language model.

Base Architecture

Qwen2.5-Coder-1.5B

Objective

Next-token prediction optimized for coding and software engineering tasks.

Compute Infrastructure

Hardware

Google Colab GPU Environment

Software

Python
PyTorch
Transformers
PEFT
Hugging Face Hub

Citation

@misc{moses2026codemateqwen,
  author = {Michael Moses},
  title = {CodeMate-Qwen: A LoRA Fine-Tuned Coding Assistant Based on Qwen2.5-Coder-1.5B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/micymike}
}

Model Card Authors

Michael Moses

Contact

GitHub: https://github.com/micymike

Email: mosesmichael878@gmail.com

Future Work

Planned improvements include:

Larger instruction datasets
Quantized deployments
Benchmark evaluation on HumanEval and MBPP
Additional programming language support
Interactive web demo
Advanced code review capabilities

Downloads last month: 68

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for micymike/codemate-qwen-1.5B

Finetunes

1 model

Quantizations

1 model