Apollo-2B-GGUF / README.md
Xidong's picture
Update README.md
33f66d1 verified
metadata
pipeline_tag: text-generation
language: multilingual
license: apache-2.0
tags:
  - Multitask Language Understanding
  - Multilingual
widget:
  - text: >-
      In traditional Western medicine, which vitamin is commonly recommended to
      prevent scurvy? A) Vitamin A B) Vitamin B12 C) Vitamin C D) Vitamin D
    example_title: English
  - text: 在中医理论中,以下哪种药材不是治疗风湿病的常用药物? A) 独活 B) 秦艽 C) 甘草 D) 珍珠粉
    example_title: Chinese
  - text: >-
      السؤال:** ما هو العلاج الطبيعي الذي يستخدم تقليديًا في الطب العربي لتحسين
      الهضم؟ A) الزنجبيل B) النعناع C) القرفة D) الحلبة
    example_title: Arabic
  - text: >-
      आयुर्वेद में, किस औषधि का उपयोग आमतौर पर जुकाम के इलाज के लिए किया जाता
      है? A) नीम B) तुलसी C) गिलोय D) अश्वगंधा
    example_title: Hindi
  - text: >-
      En la medicina tradicional española, ¿qué alimento se considera
      beneficioso para la salud del hígado? A) Aceite de oliva B) Tomate C) Foie
      gras (hígado de ganso) D) Ajo
    example_title: Spanish
  - text: >-
      Dans la tradition médicinale française, quel produit est réputé pour ses
      bienfaits sur la digestion ? A) Le vin rouge B) Le fromage C) Le foie gras
      D) Les herbes de Provence
    example_title: French

Multilingual Medicine: Model, Dataset, Benchmark, Code

Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far

👨🏻‍💻Github •📃 Paper • 🌐 Demo • 🤗 ApolloCorpus • 🤗 XMedBench
中文 | English

Apollo

🌈 Update

  • [2024.03.07] Paper released.
  • [2024.02.12] ApolloCorpus and XMedBench is published!🎉
  • [2024.01.23] Apollo repo is published!🎉

Results

Apollo-0.5B • 🤗 Apollo-1.8B • 🤗 Apollo-2B • 🤗 Apollo-6B • 🤗 Apollo-7B

Apollo

Dataset & Evaluation

  • Dataset 🤗 ApolloCorpus

    Click to expand

    Apollo

    • Zip File
    • Data category
      • Pretrain:
        • data item:
          • json_name: {data_source}{language}{data_type}.json
          • data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki
          • language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi)
          • data_type: qa(generated qa from text)
          • data_type==text: list of string
            [
              "string1",
              "string2",
              ...
            ]
            
          • data_type==qa: list of qa pairs(list of string)
            [
              [
                "q1",
                "a1",
                "q2",
                "a2",
                ...
              ],
              ...
            ]
            
      • SFT:
        • json_name: {data_source}_{language}.json
        • data_type: code, general, math, medicalExam, medicalPatient
        • data item: list of qa pairs(list of string)
            [
              [
                "q1",
                "a1",
                "q2",
                "a2",
                ...
              ],
              ...
            ]
          
  • Evaluation 🤗 XMedBench

    Click to expand
    • EN:

      • MedQA-USMLE
      • MedMCQA
      • PubMedQA: Because the results fluctuated too much, they were not used in the paper.
      • MMLU-Medical
        • Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
    • ZH:

      • MedQA-MCMLE
      • CMB-single: Not used in the paper
        • Randomly sample 2,000 multiple-choice questions with single answer.
      • CMMLU-Medical
        • Anatomy, Clinical_knowledge, College_medicine, Genetics, Nutrition, Traditional_chinese_medicine, Virology
      • CExam: Not used in the paper
        • Randomly sample 2,000 multiple-choice questions
    • ES: Head_qa

    • FR: Frenchmedmcqa

    • HI: MMLU_HI

      • Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine
    • AR: MMLU_Ara

      • Clinical knowledge, Medical genetics, Anatomy, Professional medicine, College biology, College medicine

Results reproduction

Click to expand

Waiting for Update

Citation

Please use the following citation if you intend to use our dataset for training or evaluation:

@misc{wang2024apollo,
   title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
   author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
   year={2024},
   eprint={2403.03640},
   archivePrefix={arXiv},
   primaryClass={cs.CL}
}