Model Card for BioGPT-FineTuned-MedicalTextbooks-FP16

Model Overview

This model is a fine-tuned and quantized version of the microsoft/biogpt model, specifically tailored for medical text understanding. It was fine-tuned on the dmedhi/medical-textbooks dataset from Hugging Face and subsequently quantized to FP16 (half-precision) to reduce memory usage and improve inference speed while maintaining accuracy. The model is designed for tasks like keyword extraction from medical texts and generative tasks in the biomedical domain.

Model Details

Base Model: microsoft/biogpt
Fine-Tuning Dataset: dmedhi/medical-textbooks (15,970 rows)
Quantization: FP16 (half-precision) using PyTorch's .half() method
Model Type: Causal Language Model
Language: English

Intended Use

This model is intended for:

Keyword Extraction: Extracting relevant lines containing specific keywords (e.g., "anatomy") from medical textbooks, along with metadata like book names.
Generative Tasks: Generating short explanations or summaries in the biomedical domain (e.g., answering questions like "What is anatomy?").
Research and Education: Assisting researchers, students, and educators in exploring medical texts and generating insights.

Out of Scope

Real-time clinical decision-making or medical diagnosis (not evaluated for such tasks).
Non-English text processing (not tested on other languages).
Tasks requiring high precision in generative output without human oversight.

Training Details

Dataset

The model was fine-tuned on the dmedhi/medical-textbooks dataset, which contains excerpts from medical textbooks with two attributes:

text: The content of the excerpt. book: The name of the book (e.g., "Gray's Anatomy").

Dataset Splits:

Original split: train (15,970 rows).
Custom splits: 80% train (12,776 rows), 20% validation (3,194 rows).

Training Procedure

Preprocessing:

Tokenized the text field using the BioGPT tokenizer (microsoft/biogpt).
Set max_length=512, with truncation and padding.
Used input_ids as labels for causal language modeling.

Fine-Tuning:

Fine-tuned microsoft/biogpt using Hugging Face's Trainer API.

Training arguments:
Epochs: 1
Batch size: 4 per device
Learning rate: 2e-5
Mixed precision: FP16 (fp16=True)
Evaluation strategy: Steps (every 1000 steps)
Training loss decreased from 2.8409 to 2.7006 over 3,194 steps.
Validation loss decreased from 2.7317 to 2.6512.

Quantization:

Converted the fine-tuned model to FP16 using PyTorch's .half() method.
Saved as ./biogpt_finetuned/final_model_fp16.
Compute Infrastructure
Hardware: 12 GB GPU (NVIDIA)
Environment: Jupyter Notebook on Windows
Framework: PyTorch, Hugging Face Transformers
Training Time: Approximately 27 minutes for 1 epoch

Evaluation

Metrics

Training Loss: Decreased from 2.8409 to 2.7006.
Validation Loss: Decreased from 2.7317 to 2.6512.
Memory Usage: Post-quantization memory usage reported as ~661 MB (FP16), though actual savings may vary due to buffers and non-weight tensors.

Qualitative Testing

Generative Task: Generated a response to "What is anatomy?" with reasonable output: "What is anatomy? Anatomy is the basis of medicine..." Keyword Extraction: Successfully extracted up to 10 lines containing keywords (e.g., "anatomy") with corresponding book names (e.g., "Gray's Anatomy").

Usage

Installation

Ensure you have the required libraries installed:

pip install transformers torch datasets sacremoses

Loading the Model

Load the quantized FP16 model and tokenizer:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "path/to/biogpt_finetuned/final_model_fp16"  # Update with your HF repo path
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

Example 1: Generative Inference

Generate text with the quantized model:


input_text = "What is anatomy?"
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
    outputs = model.generate(**inputs, max_length=50)
output_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output_text)

Example 2: Keyword Extraction

from datasets import load_from_disk

original_datasets = load_from_disk('path/to/original_medical_textbooks')

def extract_lines_with_keyword(keyword, dataset_split='train', max_results=10):
    dataset = original_datasets[dataset_split]
    matching_lines = []
    for entry in dataset:
        text = entry['text']
        book = entry['book']
        lines = text.split('\n')
        for line in lines:
            if keyword.lower() in line.lower():
                matching_lines.append({'text': line.strip(), 'book': book})
                if len(matching_lines) >= max_results:
                    return matching_lines
    return matching_lines

keyword = "anatomy"
matching_lines = extract_lines_with_keyword(keyword)
for i, match in enumerate(matching_lines, 1):
    print(f"{i}. Text: {match['text']}")
    print(f"   Book: {match['book']}\n")

Limitations

Quantization Trade-offs: FP16 quantization may lead to minor accuracy degradation, though not extensively evaluated.
Dataset Bias: Fine-tuned only on dmedhi/medical-textbooks, which may not cover all medical domains or topics.
Generative Quality: Generative outputs may require human oversight for correctness.
Scalability: Keyword extraction relies on string matching, not semantic understanding, limiting its ability to capture nuanced relationships.