cchristophe
commited on
Commit
•
fa9b254
1
Parent(s):
7c47475
Update README.md
Browse files
README.md
CHANGED
@@ -161,7 +161,7 @@ Which response is of higher overall quality in a medical context? Consider:
|
|
161 |
|
162 |
#### Win-rate
|
163 |
|
164 |
-
[
|
165 |
|
166 |
|
167 |
### MCQA Evaluation
|
@@ -198,7 +198,7 @@ Please report any software "bug" or other problems through one of the following
|
|
198 |
|
199 |
## Acknowledgements
|
200 |
|
201 |
-
|
202 |
|
203 |
## Citation
|
204 |
```
|
|
|
161 |
|
162 |
#### Win-rate
|
163 |
|
164 |
+
![plot](./pairwise_model_comparison.svg)
|
165 |
|
166 |
|
167 |
### MCQA Evaluation
|
|
|
198 |
|
199 |
## Acknowledgements
|
200 |
|
201 |
+
We thank the Torch FSDP team for their robust distributed training framework, the EleutherAI harness team for their valuable evaluation tools, and the Hugging Face Alignment team for their contributions to responsible AI development.
|
202 |
|
203 |
## Citation
|
204 |
```
|