Update README.md
Browse files
README.md
CHANGED
@@ -20,18 +20,19 @@ Welcome to the official HuggingFace repository for BiMediX, the bilingual medica
|
|
20 |
- **Evaluation Benchmark for Arabic Medical LLMs**: Comprehensive benchmark for evaluating Arabic medical language models, setting a new standard in the field.
|
21 |
- **State-of-the-Art Performance**: Outperforms existing models in medical benchmarks, while 8-times faster than comparable existing models.
|
22 |
|
|
|
23 |
|
24 |
## Getting Started
|
25 |
|
26 |
```python
|
27 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
28 |
|
29 |
-
model_id = "
|
30 |
-
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
31 |
|
|
|
32 |
model = AutoModelForCausalLM.from_pretrained(model_id)
|
33 |
|
34 |
-
text = "
|
35 |
inputs = tokenizer(text, return_tensors="pt")
|
36 |
|
37 |
outputs = model.generate(**inputs, max_new_tokens=500)
|
@@ -41,7 +42,8 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
41 |
|
42 |
## Model Details
|
43 |
|
44 |
-
|
|
|
45 |
|
46 |
## Dataset
|
47 |
|
|
|
20 |
- **Evaluation Benchmark for Arabic Medical LLMs**: Comprehensive benchmark for evaluating Arabic medical language models, setting a new standard in the field.
|
21 |
- **State-of-the-Art Performance**: Outperforms existing models in medical benchmarks, while 8-times faster than comparable existing models.
|
22 |
|
23 |
+
For full details of this model please read our [paper (pre-print)](#).
|
24 |
|
25 |
## Getting Started
|
26 |
|
27 |
```python
|
28 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
29 |
|
30 |
+
model_id = "BiMediX/BiMediX-Bi"
|
|
|
31 |
|
32 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
33 |
model = AutoModelForCausalLM.from_pretrained(model_id)
|
34 |
|
35 |
+
text = "Hello BiMediX! I've been experiencing increased tiredness in the past week."
|
36 |
inputs = tokenizer(text, return_tensors="pt")
|
37 |
|
38 |
outputs = model.generate(**inputs, max_new_tokens=500)
|
|
|
42 |
|
43 |
## Model Details
|
44 |
|
45 |
+
|
46 |
+
The BiMediX model, built on a Mixture of Experts (MoE) architecture, leverages the Mixtral-8x7B base network. This approach enables the model to scale significantly by utilizing a sparse operation method, where only a subset of its 47 billion parameters are active during inference, enhancing efficiency. It features a sophisticated router network to allocate tasks to the most relevant experts, each being a specialized feedforward block within the model. The training utilized the BiMed1.3M dataset, focusing on bilingual medical interactions in both English and Arabic, with a substantial corpus of over 632 million healthcare-specialized tokens. The model's fine-tuning process includes a low-rank adaptation technique (QLoRA) to efficiently adapt the model to specific tasks while keeping computational demands manageable.
|
47 |
|
48 |
## Dataset
|
49 |
|