Commit
β’
256c5d3
1
Parent(s):
d16cee2
Add details on the datasets for reproducibility (#107)
Browse files- Add details on the datasets for reproducibility (24fd7d1928ac57b4c824fe7145eb0c62e21d4444)
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.huggingface.co>
- src/assets/text_content.py +13 -6
src/assets/text_content.py
CHANGED
@@ -77,10 +77,9 @@ With the plethora of large language models (LLMs) and chatbots being released we
|
|
77 |
|
78 |
We chose these benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.
|
79 |
|
80 |
-
|
81 |
# Some good practices before submitting a model
|
82 |
|
83 |
-
|
84 |
```python
|
85 |
from transformers import AutoConfig, AutoModel, AutoTokenizer
|
86 |
config = AutoConfig.from_pretrained("your model name", revision=revision)
|
@@ -92,16 +91,24 @@ If this step fails, follow the error messages to debug your model before submitt
|
|
92 |
Note: make sure your model is public!
|
93 |
Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
|
94 |
|
95 |
-
|
96 |
It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of weights of your model to the `Extended Viewer`!
|
97 |
|
98 |
-
|
99 |
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model π€
|
100 |
|
101 |
-
|
102 |
When we add extra information about models to the leaderboard, it will be automatically taken from the model card
|
103 |
|
104 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
To reproduce our results, here is the commands you can run, using [this version](https://github.com/EleutherAI/lm-evaluation-harness/tree/e47e01beea79cfe87421e2dac49e64d499c240b4) of the Eleuther AI Harness:
|
106 |
`python main.py --model=hf-causal --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"`
|
107 |
` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=2 --output_path=<output_path>`
|
|
|
77 |
|
78 |
We chose these benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.
|
79 |
|
|
|
80 |
# Some good practices before submitting a model
|
81 |
|
82 |
+
### 1) Make sure you can load your model and tokenizer using AutoClasses:
|
83 |
```python
|
84 |
from transformers import AutoConfig, AutoModel, AutoTokenizer
|
85 |
config = AutoConfig.from_pretrained("your model name", revision=revision)
|
|
|
91 |
Note: make sure your model is public!
|
92 |
Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
|
93 |
|
94 |
+
### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
|
95 |
It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of weights of your model to the `Extended Viewer`!
|
96 |
|
97 |
+
### 3) Make sure your model has an open license!
|
98 |
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model π€
|
99 |
|
100 |
+
### 4) Fill up your model card
|
101 |
When we add extra information about models to the leaderboard, it will be automatically taken from the model card
|
102 |
|
103 |
+
# Reproducibility and details
|
104 |
+
|
105 |
+
### Details and logs
|
106 |
+
You can find:
|
107 |
+
- detailed numerical results in the `results` Hugging Face dataset: https://huggingface.co/datasets/open-llm-leaderboard/results
|
108 |
+
- details on the input/outputs for the models in the `details` Hugging Face dataset: https://huggingface.co/datasets/open-llm-leaderboard/details
|
109 |
+
- community queries and running status in the `requests` Hugging Face dataset: https://huggingface.co/datasets/open-llm-leaderboard/requests
|
110 |
+
|
111 |
+
### Reproducibility
|
112 |
To reproduce our results, here is the commands you can run, using [this version](https://github.com/EleutherAI/lm-evaluation-harness/tree/e47e01beea79cfe87421e2dac49e64d499c240b4) of the Eleuther AI Harness:
|
113 |
`python main.py --model=hf-causal --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"`
|
114 |
` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=2 --output_path=<output_path>`
|