Text Generation
Transformers
Safetensors
mixtral
conversational
Inference Endpoints
text-generation-inference
jondurbin commited on
Commit
614649c
1 Parent(s): 4b07f03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -42,6 +42,8 @@ An experimental fine-tune of [mixtral-8x7b-v0.1](https://huggingface.co/mistrala
42
 
43
  This is the model after the SFT phase, before DPO has been applied.
44
 
 
 
45
  ### Data sources
46
 
47
  *Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*
@@ -381,6 +383,12 @@ def parse_plan(plan):
381
  context[parts.group(1)] = method_map[parts.group(2)](parts.group(3), **context)
382
  ```
383
 
 
 
 
 
 
 
384
  ### Licence and usage restrictions
385
 
386
  The base model is mixtral-8x7b-v0.1, which is licensed as apache-2.0 - no issues there.
 
42
 
43
  This is the model after the SFT phase, before DPO has been applied.
44
 
45
+ Hardware kindly provided by [Massed Compute](https://massedcompute.com/?utm_source=huggingface&utm_creative_format=model_card&utm_content=creator_jon)
46
+
47
  ### Data sources
48
 
49
  *Yes, you will see benchmark names in the list, but this only uses the train splits, and a decontamination by cosine similarity is performed at the end as a sanity check*
 
383
  context[parts.group(1)] = method_map[parts.group(2)](parts.group(3), **context)
384
  ```
385
 
386
+ ### Fine-tuning information
387
+
388
+ You can find charts, and the full configuration used to fine-tune this model on [weights and biases](https://wandb.ai/jondurbin/bagel-8x7b-v0.2/runs/agxjjdso?workspace=user-jondurbin)
389
+
390
+ The model was fine-tuned on an 8x a6000 instance, for 4 days, 15 hours, 6 minutes and 42 seconds.
391
+
392
  ### Licence and usage restrictions
393
 
394
  The base model is mixtral-8x7b-v0.1, which is licensed as apache-2.0 - no issues there.