YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

---
language: en
license: mit
base_model: meta-llama/Llama-3.2-1B-Instruct
tags:
  - medical
  - india
  - healthcare
  - llama
  - text-generation
  - indian-healthcare
  - mental-health
  - merged
pipeline_tag: text-generation
---

# MedQuery-India-v1 (Merged)

**No Meta approval or Hugging Face login required to use this model!**

This is the **merged, standalone version** of the original `MedQuery-India-v1` QLoRA adapter. It is a fine-tuned version of **Llama-3.2-1B-Instruct** for Indian medical question answering β€” covering AIIMS/NEET clinical protocols, Indian drug brands (Crocin, Dolo, Combiflam), regional diseases (dengue, typhoid, TB/DOTS, chikungunya), national health programs (NTEP, NVBDCP, RSSDI, IAP), and mental health support with cultural sensitivity.

> *Why this exists:* Most open-source medical AI models are trained on PubMed and USMLE data β€” optimized for Western clinical contexts. Indian patients ask about Dolo 650, not acetaminophen. They ask about DOTS, not generic TB regimens. This model is trained to understand that gap.

---

## ⚑ Quick Start β€” One Cell, Any Notebook

Open in **Google Colab** (Runtime β†’ Change runtime type β†’ **T4 GPU**) or any Kaggle notebook and paste this single cell. 

Since this is a merged model, it loads natively with standard `transformers`. Just change `QUESTION` to anything you want to ask!

```python
# ============================================================
# MedQuery-India-v1 (Merged) β€” Direct Inference
# Works on Google Colab / Kaggle / any notebook with a T4 GPU
# ============================================================

# --- Step 1: Install basic dependencies ---
import subprocess
subprocess.run(["pip", "install", "-q", "transformers", "torch", "accelerate"], check=True)

# --- Step 2: Load the model directly ---
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

MODEL_ID = "kanha98/medquery-india-v1-merged"

print("Downloading and loading the model (approx 2.5 GB)...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token

# Loading in float16 - Easily fits in free Colab T4 (15GB VRAM)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)
print("βœ… Model loaded successfully!")

# --- Step 3: Ask your question β€” change this line ↓ ---
QUESTION = "What are the warning signs of severe dengue?"
# -------------------------------------------------------

SYSTEM = (
    "You are MedQuery-India, a medical AI assistant trained on Indian healthcare context "
    "including AIIMS/NEET clinical protocols, Indian drug brands, regional diseases, "
    "Indian procedural guidelines (NTEP, NVBDCP, RSSDI, IAP), and mental health support. "
    "Answer accurately, safely, and with cultural sensitivity."
)

prompt = (
    f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n{SYSTEM}<|eot_id|>"
    f"<|start_header_id|>user<|end_header_id|>\n{QUESTION}<|eot_id|>"
    f"<|start_header_id|>assistant<|end_header_id|>\n"
)

inputs  = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=250,
        temperature=0.3,
        do_sample=True,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
    )

print("\n🩺 --- MedQuery-India Response ---")
print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip())


Model Details

Property Value
Model Format Merged Standalone (FP16/FP32)
Base model meta-llama/Llama-3.2-1B-Instruct
Parameters 1,235,814,400 (1.24B)
Original Fine-tuning QLoRA (4-bit NF4 quantization)
Original LoRA rank r = 64
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (7 modules)
Training hardware Tesla T4 (Kaggle, 14.5GB VRAM)
Final training loss 1.5468

(Note: The adapter weights from the original fine-tuning have been permanently merged into the base model's weights for ease of use.)


Why These Architecture Decisions

Why Llama-3.2-1B-Instruct?

  1. Tokenizer efficiency on medical vocabulary. Llama-3's 128k BPE vocabulary encodes medical terms like "acetaminophen", "thrombocytopenia", and "leptospirosis" as 1–2 tokens. GPT-2's 50k vocabulary splits the same terms into 4–6 tokens. Fewer tokens per medical term means the model sees more semantic context within the 512-token window.
  2. Grouped Query Attention (GQA). Llama-3.2 uses GQA with an 8:1 ratio. This reduces KV cache memory significantly compared to standard multi-head attention, enabling longer context at the same VRAM cost.
  3. The 1B sweet spot. Larger than SmolLM2-360M (better reasoning, longer coherent answers), smaller than 3B+ (fits T4 easily). Every architectural decision in this model is explainable.

Dataset Overview

Total training samples: 6,569 | Val: 780 | Test: 780

Source Samples % Why included
MedMCQA (Indian) 3,613 55.0% AIIMS/NEET exam questions β€” directly Indian clinical context
ChatDoctor 1,588 24.2% Real patient-doctor conversations β€” teaches conversational tone
MedQuAD 802 12.2% NIH structured QA β€” adds reliable factual grounding
PubMedQA 237 3.6% Expert-annotated research QA β€” adds clinical reasoning
Synthetic Indian (general) 144 2.2% Indian drug names, regional disease context
Synthetic Indian (edge cases) 135 2.1% Drug safety edge cases specific to India
Synthetic Mental Health 50 0.8% NEET stress, exam pressure, Indian mental health context

Why MedMCQA at 55%? It is the only large-scale dataset sourced directly from AIIMS and NEET PG entrance exams β€” real Indian clinical questions with expert explanations. Why synthetic data? Public medical datasets have near-zero coverage of Indian drug brands, DOTS protocol specifics, or mental health in the Indian exam context. Hand-crafted synthetic samples fill this gap directly.


Sample Outputs

Q: What are the warning signs of severe dengue?

Warning signs of severe dengue require immediate hospitalization. These include: extreme lethargy or inability to talk; persistent vomiting; severe abdominal pain; bleeding from the nose, gums, or mouth; very high fever above 40Β°C; black spots on the skin (petechiae); swelling in the face, neck, or groin; extremely low blood pressure. If you have any one of these symptoms, call your local hospital immediately.

Q: Patient took Combiflam for dengue fever. Is this dangerous?

Combiflam contains ibuprofen (an NSAID) and paracetamol. Taking NSAIDs during dengue is dangerous because they increase the risk of bleeding by inhibiting platelet function β€” already compromised in dengue. For dengue fever, only paracetamol (like Crocin or Dolo 650) is safe. NSAIDs including Combiflam, Brufen, and Aspirin must be strictly avoided.

Q: What is the DOTS program for tuberculosis in India?

DOTS (Directly Observed Treatment Short-course) is India's national TB control program under the National Tuberculosis Elimination Programme (NTEP), launched in 1992. Core components: sputum microscopy for diagnosis, a standardized short-course drug regimen, direct supervision of treatment, case detection at facility and community levels, and free treatment under the PM-JAY scheme. DOTS is implemented by state and district health departments under the Central TB Division, Ministry of Health and Family Welfare.


Limitations

  • Not a substitute for medical advice. This model is for research and educational purposes. Do not use for clinical diagnosis or treatment decisions.
  • English only (v1). Hindi, Marathi, and Bengali support are planned for v2.
  • 1B parameter ceiling. Complex multi-step clinical reasoning may produce errors. Hallucination risk exists on rare diseases.
  • Training data cutoff. Drug approvals, protocol updates, or guideline changes after the training data may not be reflected.

Citation

If you use this model in research, please cite:

@misc{gupta2025medqueryindia,
  author = {Kanhayya Gupta},
  title  = {MedQuery-India-v1: Fine-Tuning of Llama-3.2-1B for Indian Medical QA},
  year   = {2026},
  url    = {[https://huggingface.co/kanha98/medquery-india-v1-merged](https://huggingface.co/kanha98/medquery-india-v1-merged)}
}

Author

Kanhayya Gupta


Downloads last month
23
Safetensors
Model size
1B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support