HuggingFaceH4/CodeAlpaca_20K
Viewer • Updated • 20k • 9.06k • 108
GemCod is a lightweight code generation model finetuned using SFT on the base gemma-270m-it (https://huggingface.co/google/gemma-3-270m-it) model. It offers accurate and quick(ish) code snippet generation in all major programming languages. It's small size (270M parameters) allows it to run comfortably on laptop grade GPUs.
Estimated parameters: ~270M
Architecture: Gemma3
Intended use: Code snippet generation from natural language
Install requirements:
pip install -r requirements.txt
pip install transformers datasets accelerate safetensors
You can load it directly from HuggingFace:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("DireDreadlord/GemCod-codegen-270M")
model = AutoModelForCausalLM.from_pretrained("DireDreadlord/GemCod-codegen-270M")
model.to(device)
model.eval()
model.resize_token_embeddings(len(tokenizer))
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
chat_template = """{% for message in messages %}{% if message['role'] == 'user' %}User: {{ message['content'] }}
{% elif message['role'] == 'assistant' %}Assistant: {{ message['content'] }}
{% endif %}{% endfor %}"""
tokenizer.chat_template = chat_template
def generate_code(prompt, max_tokens) -> str:
messages = [
{
"role": "user",
"content": prompt
}
]
formatted_prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
input_length = inputs["input_ids"].shape[1]
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
num_beams=1,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
use_cache=False,
)
generated_tokens = outputs[0][input_length:]
generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
return generated_text
prompt = "give me a cpp function that prints the first n fibonacci numbers"
print("Prompt: ", prompt)
result = generate_code(prompt)
print(result)