Pharma TinyLlama Instruction LoRA Adapter

This repository contains a LoRA adapter trained for instruction fine-tuning on pharma-domain instruction-response data.

This adapter was trained on top of the Stage 1 merged model:

Base model for this stage: ssuvetha/pharma-tinyllama-non-instruction-merged

That means this adapter builds on top of a model that was already domain-adapted on raw pharma text, and then further teaches it to respond in instruction / response format.

Model Type

Stage: 2
Training type: Instruction fine-tuning
Adapter type: LoRA
Training method: QLoRA-style fine-tuning
Task: Instruction-following pharma response generation

What this stage adds

Stage 1 taught the model pharma language and domain style.
Stage 2 teaches the model how to respond to prompts like:

Explain the mechanism of action of Metformin.
Why do atorvastatin and ezetimibe work well together?
Summarize the role of lipid nanoparticles in mRNA vaccines.

This stage improves:

instruction following
response formatting
pharma-domain Q&A style outputs
domain-aware assistant behavior

Training data format

The training data is formatted like:

### Instruction:
Explain the mechanism of action of Metformin.

### Response:
Metformin primarily activates AMPK...

Optional input fields may be formatted as:

### Instruction:
Summarize the following finding.

### Input:
<extra context here>

### Response:
...

Intended use

This adapter is intended for:

pharma-domain instruction tuning experiments
educational chatbot research
Stage 2 in a multi-stage fine-tuning pipeline
domain-specific assistant prototyping

Not intended use

This model is not intended for:

medical diagnosis
treatment recommendations
clinical deployment
emergency or safety-critical use

Training pipeline summary

The high-level Stage 2 pipeline was:

Load the Stage 1 merged model
Load pharma instruction dataset
Format examples into instruction-response text
Tokenize and pad to fixed length
Add a fresh LoRA adapter
Fine-tune on instruction data
Save and upload adapter
Merge for Stage 3 preference tuning

Training configuration summary

Base model: ssuvetha/pharma-tinyllama-non-instruction-merged
Max length: 512
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Learning rate: 1e-4
Batch size per device: 1
Gradient accumulation steps: 8
Max steps: 5
Quantization: 4-bit NF4
Hardware: Google Colab T4 GPU

How to use

Load this adapter on top of the merged Stage 1 model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base_model_name = "ssuvetha/pharma-tinyllama-non-instruction-merged"
adapter_name = "ssuvetha/pharma-tinyllama-instruction-lora-adapter"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(base_model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, adapter_name)
model.eval()

Example inference

prompt = """### Instruction:
Explain the primary mechanism of action of metformin.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Prompt format

Use this adapter with instruction-style prompts:

### Instruction:
<your question>

### Response:

Optional input variant:

### Instruction:
<your task>

### Input:
<extra context>

### Response:

Limitations

trained on a small instruction dataset
may hallucinate scientific details
may produce plausible but incorrect medical content
not safety aligned for clinical use
not a substitute for licensed medical expertise

Project pipeline context

This adapter is part of a staged pharma fine-tuning project:

Stage 1: non-instruction domain adaptation
Stage 2: instruction fine-tuning
Stage 3: preference tuning with DPO

This repository contains the Stage 2 adapter only.

Citation

If you use this model, please cite:

TinyLlama
PEFT / LoRA / QLoRA
your project repository or notebook

Downloads last month: 10