Update README.md
Browse files
README.md
CHANGED
@@ -42,6 +42,8 @@ An experimental fine-tune of [mixtral-8x7b-v0.1](https://huggingface.co/mistrala
|
|
42 |
|
43 |
This is the model after the SFT phase, before DPO has been applied.
|
44 |
|
|
|
|
|
45 |
### Data sources
|
46 |
|
47 |
*Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*
|
@@ -381,6 +383,12 @@ def parse_plan(plan):
|
|
381 |
context[parts.group(1)] = method_map[parts.group(2)](parts.group(3), **context)
|
382 |
```
|
383 |
|
|
|
|
|
|
|
|
|
|
|
|
|
384 |
### Licence and usage restrictions
|
385 |
|
386 |
The base model is mixtral-8x7b-v0.1, which is licensed as apache-2.0 - no issues there.
|
|
|
42 |
|
43 |
This is the model after the SFT phase, before DPO has been applied.
|
44 |
|
45 |
+
Hardware kindly provided by [Massed Compute](https://massedcompute.com/?utm_source=huggingface&utm_creative_format=model_card&utm_content=creator_jon)
|
46 |
+
|
47 |
### Data sources
|
48 |
|
49 |
*Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*
|
|
|
383 |
context[parts.group(1)] = method_map[parts.group(2)](parts.group(3), **context)
|
384 |
```
|
385 |
|
386 |
+
### Fine-tuning information
|
387 |
+
|
388 |
+
You can find charts, and the full configuration used to fine-tune this model on [weights and biases](https://wandb.ai/jondurbin/bagel-8x7b-v0.2/runs/agxjjdso?workspace=user-jondurbin)
|
389 |
+
|
390 |
+
The model was fine-tuned on an 8x a6000 instance, for 4 days, 15 hours, 6 minutes and 42 seconds.
|
391 |
+
|
392 |
### Licence and usage restrictions
|
393 |
|
394 |
The base model is mixtral-8x7b-v0.1, which is licensed as apache-2.0 - no issues there.
|