Model Details

This model is a fine-tuned language model designed to identify and extract Human Phenotype Ontology (HPO) terms from clinical text. It is trained using an Alpaca-style instruction format, allowing it to map medical descriptions to their corresponding HPO terms, IDs, and definitions.

Function to generate responses

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

def generate_response(instruction, input_text=""):
    # Create the Alpaca-style prompt
    if input_text.strip():
        prompt = (
            "You are an expert at identifying HPO ids. "
            "Provide the most accurate HPO id for the given input.\n\n"
            f"### Instruction:\n{instruction}\n\n"
            f"### Input:\n{input_text}\n\n"
            "### Response:\n"
        )
    else:
        prompt = (
            "You are an expert at identifying HPO ids. "
            "Provide the most accurate HPO id for the given input.\n\n"
            f"### Instruction:\n{instruction}\n\n"
            "### Response:\n"
        )

    # Tokenize input prompt
    inputs = tokenizer(prompt, return_tensors="pt")
    
    # Generate response
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_length=512,  # Adjust based on your expected output size
            temperature=0.7,  # Controls randomness
            top_p=0.9,  # Nucleus sampling
            do_sample=True  # Enables sampling
        )
    
    # Decode output tokens to text
    response = tokenizer.decode(output[0], skip_special_tokens=True)
    
    # Remove the input prompt from the output
    return response[len(prompt):].strip()

# Test the model with an example instruction
instruction = "Extract the Human Phenotype Ontology (HPO) details from the following clinical context. Provide the HPO Term, HPO ID, and HPO Definition."
input_text = "An anomaly of the intracellular membrane complexes known as the dense tubular system."

response = generate_response(instruction, input_text)
print("\nGenerated Response:\n", response)

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32