pharma-tinyllama-instruction-merged

Model Summary

This is the Stage 2 merged model from the llm-finetuning-playbook pipeline.

It is produced by instruction fine-tuning (SFT) of the Stage-1 domain-adapted model (SivaSai8143/pharma-tinyllama-non-instruction-merged) on Alpaca-style pharma instruction/response pairs, using QLoRA (4-bit, nf4), then merging the LoRA adapter back into the base weights.

This merged model is the base for Stage 3 (preference tuning / DPO).


Pipeline Position

TinyLlama-1.1B (base)
        โ†“  Stage 1: Non-Instruction FT
pharma-tinyllama-non-instruction-merged
        โ†“  Stage 2: Instruction FT / SFT (this model)
pharma-tinyllama-instruction-merged  โ† you are here
        โ†“  Stage 3: Preference Tuning (DPO)
pharma-tinyllama-dpo-merged

Training Details

Parameter Value
Base model SivaSai8143/pharma-tinyllama-non-instruction-merged
Method QLoRA (4-bit nf4, double quant)
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Target modules q/k/v/o_proj, gate/up/down_proj
Max sequence length 512 tokens
Epochs 5
Max steps 5
Batch size 1 (grad accum 8, effective = 8)
Learning rate 1e-4
Warmup steps 2
Weight decay 0.01
Environment Google Colab T4 GPU

Training Data

Trained on SivaSai8143/pharma-finetuning-data (config: instruction).

48 instruction/response pairs formatted in Alpaca style:

### Instruction:
<instruction>

### Response:
<output>

Covering:

  • Metformin pharmacology, pharmacokinetics, safety & clinical use
  • Lipid-lowering therapy (Atorvastatin + Ezetimibe), familial hypercholesterolemia
  • mRNA vaccine platforms and immune response
  • AI in drug discovery, lead optimization, ADME/toxicology
  • Clinical trial terminology and pharmacovigilance

Related Artifacts

Artifact Link
Stage 2 merged model pharma-tinyllama-instruction-merged
Stage 1 merged model pharma-tinyllama-non-instruction-merged
Stage 3 merged model pharma-tinyllama-dpo-merged
Training notebook llm-finetuning-playbook
Dataset pharma-finetuning-data

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "SivaSai8143/pharma-tinyllama-instruction-merged"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = \"\"\"### Instruction:
Explain the primary mechanism of action of metformin.

### Response:
\"\"\"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Disclaimer

Educational fine-tuning project for demonstrating LLM training pipelines. The pharma content is for technical demonstration only and is not medical advice. """

with open("/content/pharma_tinyllama_instruction_merged_model/README.md", "w") as f: f.write(model_card) print("Model card written.")

Downloads last month
23
Safetensors
Model size
1B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SivaSai8143/pharma-tinyllama-instruction-merged