jondurbin
/

bagel-8x7b-v0.2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jondurbin commited on Jan 5

Commit

614649c

•

1 Parent(s): 4b07f03

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -42,6 +42,8 @@ An experimental fine-tune of [mixtral-8x7b-v0.1](https://huggingface.co/mistrala
 This is the model after the SFT phase, before DPO has been applied.
 ### Data sources
 *Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*
@@ -381,6 +383,12 @@ def parse_plan(plan):
         context[parts.group(1)] = method_map[parts.group(2)](parts.group(3), **context)
 ```
 ### Licence and usage restrictions
 The base model is mixtral-8x7b-v0.1, which is licensed as apache-2.0 - no issues there.

 This is the model after the SFT phase, before DPO has been applied.
+Hardware kindly provided by [Massed Compute](https://massedcompute.com/?utm_source=huggingface&utm_creative_format=model_card&utm_content=creator_jon)
 ### Data sources
 *Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*
         context[parts.group(1)] = method_map[parts.group(2)](parts.group(3), **context)
 ```
+### Fine-tuning information
+You can find charts, and the full configuration used to fine-tune this model on [weights and biases](https://wandb.ai/jondurbin/bagel-8x7b-v0.2/runs/agxjjdso?workspace=user-jondurbin)
+The model was fine-tuned on an 8x a6000 instance, for 4 days, 15 hours, 6 minutes and 42 seconds.
 ### Licence and usage restrictions
 The base model is mixtral-8x7b-v0.1, which is licensed as apache-2.0 - no issues there.