JordiBayarri commited on
Commit
b2befb1
verified
1 Parent(s): ff83dac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -44,11 +44,13 @@ Aloe is trained in 20 medical tasks, resulting in a robust and versatile healthc
44
 
45
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/VUYw4IdANKGrH2VOedwH0.png)
46
 
47
- Aloe-8B-Beta is the latest iteration in the Aloe family, building and improving on the success of its predecessor, [Aloe-8B-Alpha](https://huggingface.co/HPAI-BSC/Llama3-Aloe-8B-Alpha).
48
- Beta more than triples the training data used by Alpha, for a total of 1.8B tokens, including a wider variety of medical tasks and instructions (e.g., text summarization, explanation, diagnosis, text classification, treatment recommendation, ...).
49
 
50
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
51
 
 
 
52
  Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
53
 
54
  Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
@@ -77,10 +79,9 @@ Aloe Beta has been tested on the most popular healthcare QA datasets, with and w
77
 
78
  The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
79
 
 
80
 
81
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/ZABYUxpQRMDcrJmKhkEfz.png)
82
-
83
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/2NW3im0aH2u6RKp969sjx.png)
84
 
85
  We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark. Aloe-Beta gets competitive results with the current SOTA general models in the most used general benchmarks and outperforms the medical models:
86
 
@@ -236,12 +237,12 @@ We used Deepspeed's Zero-3 distributed training using the following hardware:
236
 
237
  The training set consists of around 1.8B tokens, having 3 different types of data:
238
 
239
- - Medical domain datasets:
240
  - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
241
  - [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
242
  - [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
243
  - [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
244
- - Synthetic data generated using Llama3.1:
245
  - [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
246
  - [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
247
  - [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
@@ -249,7 +250,7 @@ The training set consists of around 1.8B tokens, having 3 different types of dat
249
  - [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
250
  - [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
251
  - Genstruct data (coming soon)
252
- - General data:
253
  - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
254
 
255
  #### Training parameters
 
44
 
45
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/VUYw4IdANKGrH2VOedwH0.png)
46
 
47
+ **Aloe-8B-Beta** is the latest iteration in the **Aloe family**, building and improving on the success of its predecessor, [Aloe-8B-Alpha](https://huggingface.co/HPAI-BSC/Llama3-Aloe-8B-Alpha).
48
+ Beta more than triples the training data used by Alpha, for a total of **1.8B tokens**, including a wider variety of medical tasks and instructions (e.g., text summarization, explanation, diagnosis, text classification, treatment recommendation, ...).
49
 
50
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f7a16192950415b637e201/bCuV5kZUT9H9UECAOWDRc.png)
51
 
52
+ To mitigate catastrophic forgetting and enable the model to effectively learn new capabilities like **function calling**, we incorporated a diverse set of high-quality general-purpose data constituting 20% of the total training set. The curated data includes some of the highest-quality content available across a range of topics, including mathematics, programming, STEM, and very long instructions (> 8k tokens), to enrich the model's adaptability and comprehension across diverse domains.
53
+
54
  Beta also boosts the alignment and safety stages with respect to Alpha. This includes a [medical preference dataset](https://huggingface.co/datasets/TsinghuaC3I/UltraMedical-Preference), as well as the red-teaming dataset (available soon).
55
 
56
  Complete training details, model merging configurations, and all training data (including synthetically generated data) can be found below. This includes [the RAG system](https://github.com/HPAI-BSC/prompt_engine) that was developed to test Aloe Beta in a deployment setup. Aloe comes with a healthcare-specific risk assessment to facilitate to the safe use and deployment of such systems.
 
79
 
80
  The Beta model has been developed to excel in several different medical tasks. For this reason, we evaluated the model in many different medical tasks:
81
 
82
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/FyHZXoXCbc7AzXeCwqS9_.png)
83
 
84
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6620f941eba5274b5c12f83d/uS3qddvQ5iwbI0WZGVFDF.png)
 
 
85
 
86
  We also compared the performance of the model in the general domain, using the OpenLLM Leaderboard benchmark. Aloe-Beta gets competitive results with the current SOTA general models in the most used general benchmarks and outperforms the medical models:
87
 
 
237
 
238
  The training set consists of around 1.8B tokens, having 3 different types of data:
239
 
240
+ - Medical domain datasets. Includes data from 20 different medical tasks.
241
  - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
242
  - [HPAI-BSC/chain-of-diagnosis](https://huggingface.co/datasets/HPAI-BSC/chain-of-diagnosis)
243
  - [HPAI-BSC/MedS-Ins](https://huggingface.co/datasets/HPAI-BSC/MedS-Ins)
244
  - [HPAI-BSC/ultramedica](https://huggingface.co/datasets/HPAI-BSC/ultramedical)
245
+ - Synthetic data. We expanded our training data by generating high-quality answers using Llama3.1-70B.
246
  - [HPAI-BSC/pubmedqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/pubmedqa-cot-llama31)
247
  - [HPAI-BSC/medqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medqa-cot-llama31)
248
  - [HPAI-BSC/medmcqa-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/medmcqa-cot-llama31)
 
250
  - [HPAI-BSC/MMLU-medical-cot-llama31](https://huggingface.co/datasets/HPAI-BSC/MMLU-medical-cot-llama31)
251
  - [HPAI-BSC/Polymed-QA](https://huggingface.co/datasets/HPAI-BSC/Polymed-QA)
252
  - Genstruct data (coming soon)
253
+ - General data. It includes maths, STEM, code, function calling, and instruction of very long instructions.
254
  - [HPAI-BSC/Aloe-Beta-General-Collection](https://huggingface.co/datasets/HPAI-BSC/Aloe-Beta-General-Collection)
255
 
256
  #### Training parameters