Update README.md
Browse files
README.md
CHANGED
@@ -334,11 +334,12 @@ We used [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) library. We aligned the
|
|
334 |
|
335 |
#### Summary
|
336 |
|
337 |
-
To compare Aloe with the most competitive open models (both general purpose and healthcare-specific) we use popular healthcare datasets (PubMedQA, MedMCQA, MedQA and MMLU for six medical tasks only), together with the new and highly reliable CareQA.
|
338 |
|
339 |
-
Benchmark results indicate the training conducted on Aloe has boosted its performance above Llama3-8B-Instruct. Llama3-Aloe-8B-Alpha outperforms larger models like Meditron 70B, and is close to larger base models, like Yi-34. For the former, this gain is consistent even when using SC-CoT, using their best-reported variant. All these results make Llama3-Aloe-8B-Alpha the best healthcare LLM of its size.
|
340 |
|
341 |
-
|
|
|
|
|
342 |
|
343 |
## Environmental Impact
|
344 |
|
|
|
334 |
|
335 |
#### Summary
|
336 |
|
337 |
+
To compare Aloe with the most competitive open models (both general purpose and healthcare-specific) we use popular healthcare datasets (PubMedQA, MedMCQA, MedQA and MMLU for six medical tasks only), together with the new and highly reliable CareQA. However, while MCQA benchmarks provide valuable insights into a model's ability to handle structured queries, they fall short in representing the full range of challenges faced in medical practice. Building upon this idea, Aloe-Beta represents the next step in the evolution of the Aloe Family, designed to broaden the scope beyond the multiple-choice question-answering tasks that defined Aloe-Alpha.
|
338 |
|
|
|
339 |
|
340 |
+
Benchmark results indicate the training conducted on Aloe has boosted its performance above Llama31-8B-Instruct. Llama31-Aloe-Beta-8B also outperforms other medical models like Llama3-OpenBioLLM and Llama3-Med42. All these results make Llama31-Aloe-8B-Beta the best healthcare LLM of its size.
|
341 |
+
|
342 |
+
With the help of prompting techniques the performance of Llama3-Aloe-8B-Beta is significantly improved. Medprompting in particular provides a 7% increase in reported accuracy, after which Llama31-Aloe-8B-Beta only lags behind much bigger models like Llama-3.1-70B-Instruct or MedPalm-2. This improvement is mostly consistent across the OpenLLM Leaderboard and the other medical tasks.
|
343 |
|
344 |
## Environmental Impact
|
345 |
|