Pharma TinyLlama Instruction LoRA Adapter

This repository contains a LoRA adapter trained for instruction fine-tuning on pharma-domain instruction-response data.

This adapter was trained on top of the Stage 1 merged model:

  • Base model for this stage: ssuvetha/pharma-tinyllama-non-instruction-merged

That means this adapter builds on top of a model that was already domain-adapted on raw pharma text, and then further teaches it to respond in instruction / response format.


Model Type

  • Stage: 2
  • Training type: Instruction fine-tuning
  • Adapter type: LoRA
  • Training method: QLoRA-style fine-tuning
  • Task: Instruction-following pharma response generation

What this stage adds

Stage 1 taught the model pharma language and domain style.
Stage 2 teaches the model how to respond to prompts like:

  • Explain the mechanism of action of Metformin.
  • Why do atorvastatin and ezetimibe work well together?
  • Summarize the role of lipid nanoparticles in mRNA vaccines.

This stage improves:

  • instruction following
  • response formatting
  • pharma-domain Q&A style outputs
  • domain-aware assistant behavior

Training data format

The training data is formatted like:

### Instruction:
Explain the mechanism of action of Metformin.

### Response:
Metformin primarily activates AMPK...

Optional input fields may be formatted as:

### Instruction:
Summarize the following finding.

### Input:
<extra context here>

### Response:
...

Intended use

This adapter is intended for:

  • pharma-domain instruction tuning experiments
  • educational chatbot research
  • Stage 2 in a multi-stage fine-tuning pipeline
  • domain-specific assistant prototyping

Not intended use

This model is not intended for:

  • medical diagnosis
  • treatment recommendations
  • clinical deployment
  • emergency or safety-critical use

Training pipeline summary

The high-level Stage 2 pipeline was:

  1. Load the Stage 1 merged model
  2. Load pharma instruction dataset
  3. Format examples into instruction-response text
  4. Tokenize and pad to fixed length
  5. Add a fresh LoRA adapter
  6. Fine-tune on instruction data
  7. Save and upload adapter
  8. Merge for Stage 3 preference tuning

Training configuration summary

  • Base model: ssuvetha/pharma-tinyllama-non-instruction-merged
  • Max length: 512
  • LoRA rank (r): 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05
  • Learning rate: 1e-4
  • Batch size per device: 1
  • Gradient accumulation steps: 8
  • Max steps: 5
  • Quantization: 4-bit NF4
  • Hardware: Google Colab T4 GPU

How to use

Load this adapter on top of the merged Stage 1 model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

base_model_name = "ssuvetha/pharma-tinyllama-non-instruction-merged"
adapter_name = "ssuvetha/pharma-tinyllama-instruction-lora-adapter"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(base_model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, adapter_name)
model.eval()

Example inference

prompt = """### Instruction:
Explain the primary mechanism of action of metformin.

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Prompt format

Use this adapter with instruction-style prompts:

### Instruction:
<your question>

### Response:

Optional input variant:

### Instruction:
<your task>

### Input:
<extra context>

### Response:

Limitations

  • trained on a small instruction dataset
  • may hallucinate scientific details
  • may produce plausible but incorrect medical content
  • not safety aligned for clinical use
  • not a substitute for licensed medical expertise

Project pipeline context

This adapter is part of a staged pharma fine-tuning project:

  • Stage 1: non-instruction domain adaptation
  • Stage 2: instruction fine-tuning
  • Stage 3: preference tuning with DPO

This repository contains the Stage 2 adapter only.


Citation

If you use this model, please cite:

  • TinyLlama
  • PEFT / LoRA / QLoRA
  • your project repository or notebook
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support