Edit model card

datagemma-2b

The datagemma-2b is a model designated for data science code generation from natural language instruction. It is fine-tuned from codegemma-2b model. Fine tuning was performed on the ed001/ds-coder-instruct-v2 dataset which is constructed by filtering publicly available datasets on HuggingFace.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model = AutoModelForCausalLM.from_pretrained(
    "ed001/datagemma-2b",
    low_cpu_mem_usage=True
).cuda()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained("ed001/datagemma-2b", trust_remote_code=True)
tokenizer.padding_side = "right"

prompt_template =  "### Question: {}\n ### Answer: "
generation_config = GenerationConfig(max_new_tokens=512, top_p=0.5, do_sample=True, repetition_penalty=1)
prompt = "How can I profile speed of my neural network using PyTorch?"
input = tokenizer(prompt_template.format(prompt), return_tensors="pt").to(model.device)["input_ids"]

print(tokenizer.decode(model.generate(input, generation_config=generation_config)[0]))

Training Details

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
target_modules: q, k, v, o, gate_proj, down_proj, up_proj
weight_decay: 0
optmizer: paged_adamw_8bit
lr: 1e-4
lr_scheduler: cosine
max_seq_len: 1536
batch_size: 1 grad_acc: 4 max_grad_norm: 0.5
warmup_ratio: 0.05
num_epochs: 1

Contact

GitHub: Ea0011

Downloads last month
10
Safetensors
Model size
2.51B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ed001/datagemma-2b

Collection including ed001/datagemma-2b