HAH-2024-v0.11 / README.md
drmasad's picture
Update README.md
43f5406 verified
metadata
language:
  - en
license: apache-2.0
pipeline_tag: text-generation
tags:
  - healthcare
  - diabetes
model-index:
  - name: HAH 2024 v0.11
    results:
      - task:
          name: Text Generation
          type: text-generation
        dataset:
          name: Custom Dataset (3000 review articles on diabetes)
          type: diabetes
        metrics:
          - name: Placeholder Metric for Development
            type: Placeholder Type
            value: 0
model-description:
  short-description: >-
    HAH 2024 v0.1 is a state-of-the-art language model fine-tuned specifically
    for generating text based on diabetes-related content. Leveraging a dataset
    constructed from 3000 open-source review articles, this model provides
    informative and contextually relevant answers to various queries about
    diabetes care, research, and therapies.
intended-use:
  primary-use: HAH 2024 v0.1 is intended to for research purposes only.
  secondary-potential-uses:
    - >-
      a Prototype for researchers to assess (not to formally use in real life
      cases) generating educational content for patients and the general public
      about diabetes care and management.
    - >-
      Check the use of adapters to assist researchers in summarizing large
      volumes of diabetes-related literature.
limitations:
  - >-
    While HAH 2024 v0.1 excels at generating contextually appropriate responses,
    it may occasionally produce outputs that require further verification.
  - >-
    The training dataset, being limited to published articles, might not capture
    all contemporary research or emerging trends in diabetes care.
training-data:
  description: >-
    The training data for HAH 2024 v0.1 consists of 3000 open-source review
    articles about diabetes, carefully curated to cover a wide range of topics
    within the field. The dataset was enriched with questions generated through
    prompting OpenAI GPT-4 to ensure diversity in content and perspectives.
training-procedure:
  description: >-
    HAH 2024 v0.1 was fine-tuned on an A100 GPU using Google Colab. The
    fine-tuning process was carefully monitored to maintain the model's
    relevance to diabetes-related content while minimizing biases that might
    arise from the dataset's specific nature.

Model Card for HAH 2024 v0.1

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

HAH 2024 v0.11 aim is to ASSESS how an advanced language model fine-tuned for generating insights from diabetes-related healthcare data will perform. HAH 2024 v0.1 is intended to for research purposes only.

  • Developed by: Dr M As'ad
  • Funded by: Self funded
  • Model type: Transformer-based language model
  • Language(s) (NLP): English
  • License: Apache-2.0
  • Finetuned from model [optional]: Mistral 7b Instruct v0.2

Uses

Direct Use

HAH 2024 v0.11 is designed to assess the performance for direct use in chat interface on diabetes domain.

Downstream Use [optional]

The model can also be fine-tuned for specialized tasks sch a subtypes or subgroups in diabetes field.

Out-of-Scope Use

This model is not recommended for non-English text or contexts outside of healthcare, IT is research project not for any deployments to be used in real chat interface.

Bias, Risks, and Limitations

The model may inherently carry biases from the training data related to diabetes literature, potentially reflecting the geographic and demographic focus of the sources.

Recommendations

Users should verify the model-generated information with current medical guidelines and consider a manual review for sensitive applications.

How to Get Started with the Model

Use the code below to get started with the model:

  from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
  
  # Assuming the model and tokenizer are loaded with 'username/HAH_2024_v0.1'
  model = AutoModelForCausalLM.from_pretrained("drmasad/HAH_2024_v0.11")
  tokenizer = AutoTokenizer.from_pretrained("drmasad/HAH_2024_v0.11")
  
  # Setting up the instruction and the user prompt
  instructions = "you are an expert endocrinologist. Answer the query in accurate informative language any patient will understand."
  user_prompt = "what is diabetic retinopathy?"
  
  # Using the pipeline for text-generation
  pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
  
  # Formatting the input with special tokens [INST] and [/INST] for instructions
  result = pipe(f"<s>[INST] {instructions} [/INST] {user_prompt}</s>")
  
  # Extracting generated text and post-processing
  generated_text = result[0]['generated_text']
  
  # Split the generated text to get the text after the last occurrence of </s>
  answer = generated_text.split("</s>")[-1].strip()
  
  # Print the answer
  print(answer)