metadata

language:
  - en
  - zh
license: llama2
library_name: transformers
tags:
  - llama
  - merge
  - medical
datasets:
  - GBaker/MedQA-USMLE-4-options
  - cognitivecomputations/samantha-data
  - shibing624/medical
base_model:
  - Severus27/BeingWell_llama2_7b
  - ParthasarathyShanmugam/llama-2-7b-samantha
pipeline_tag: text-generation
model-index:
  - name: Dr_Samantha-7b
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 53.84
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/Dr_Samantha-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 77.95
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/Dr_Samantha-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 47.94
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/Dr_Samantha-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 45.58
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/Dr_Samantha-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 73.56
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/Dr_Samantha-7b
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 18.8
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=sethuiyer/Dr_Samantha-7b
          name: Open LLM Leaderboard

Dr. Samantha

SynthIQ

Overview

Dr. Samantha is a language model made by merging Severus27/BeingWell_llama2_7b and ParthasarathyShanmugam/llama-2-7b-samantha using mergekit.

Has capabilities of a medical knowledge-focused model (trained on USMLE databases and doctor-patient interactions) with the philosophical, psychological, and relational understanding of the Samantha-7b model.

As both a medical consultant and personal counselor, Dr.Samantha could effectively support both physical and mental wellbeing - important for whole-person care.

Yaml Config


slices:
  - sources:
      - model: Severus27/BeingWell_llama2_7b
        layer_range: [0, 32]
      - model: ParthasarathyShanmugam/llama-2-7b-samantha
        layer_range: [0, 32]

merge_method: slerp
base_model: TinyPixel/Llama-2-7B-bf16-sharded

parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
tokenizer_source: union

dtype: bfloat16

Prompt Template

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
What is your name?

### Response:
My name is Samantha.

⚡ Quantized models

GGUF:https://huggingface.co/TheBloke/Dr_Samantha-7B-GGUF
GPTQ: https://huggingface.co/TheBloke/Dr_Samantha-7B-GPTQ
AWQ: https://huggingface.co/TheBloke/Dr_Samantha-7B-AWQ

Thanks to TheBloke for making this available!

Dr.Samantha is now available on Ollama. You can use it by running the command ollama run stuehieyr/dr_samantha in your terminal. If you have limited computing resources, check out this video to learn how to run it on a Google Colab backend.

OpenLLM Leaderboard Performance

T	Model	Average	ARC	Hellaswag	MMLU	TruthfulQA	Winogrande	GSM8K
1	sethuiyer/Dr_Samantha-7b	52.95	53.84	77.95	47.94	45.58	73.56	18.8
2	togethercomputer/LLaMA-2-7B-32K-Instruct	50.02	51.11	78.51	46.11	44.86	73.88	5.69
3	togethercomputer/LLaMA-2-7B-32K	47.07	47.53	76.14	43.33	39.23	71.9	4.32

Subject-wise Accuracy

Subject	Accuracy (%)
Clinical Knowledge	52.83
Medical Genetics	49.00
Human Aging	58.29
Human Sexuality	55.73
College Medicine	38.73
Anatomy	41.48
College Biology	52.08
College Medicine	38.73
High School Biology	53.23
Professional Medicine	38.73
Nutrition	50.33
Professional Psychology	46.57
Virology	41.57
High School Psychology	66.60
Average	48.85%

Evaluation by GPT-4 across 25 random prompts from ChatDoctor-200k Dataset

Overall Rating: 83.5/100

Pros:

Demonstrates extensive medical knowledge through accurate identification of potential causes for various symptoms.
Responses consistently emphasize the importance of seeking professional diagnoses and treatments.
Advice to consult specialists for certain concerns is well-reasoned.
Practical interim measures provided for symptom management in several cases.
Consistent display of empathy, support, and reassurance for patients' well-being.
Clear and understandable explanations of conditions and treatment options.
Prompt responses addressing all aspects of medical inquiries.

Cons:

Could occasionally place stronger emphasis on urgency when symptoms indicate potential emergencies.
Discussion of differential diagnoses could explore a broader range of less common causes.
Details around less common symptoms and their implications need more depth at times.
Opportunities exist to gather clarifying details on symptom histories through follow-up questions.
Consider exploring full medical histories to improve diagnostic context where relevant.
Caution levels and risk factors associated with certain conditions could be underscored more.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	52.95
AI2 Reasoning Challenge (25-Shot)	53.84
HellaSwag (10-Shot)	77.95
MMLU (5-Shot)	47.94
TruthfulQA (0-shot)	45.58
Winogrande (5-shot)	73.56
GSM8k (5-shot)	18.80