InferenceIllusionist
/

Excalibur-7b-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Mar 28

Commit

4f4c3d8

•

1 Parent(s): 85478d1

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -21,6 +21,7 @@ An initial foray into the world of fine-tuning. The goal of this release was to
 * [Excalibur-7b](https://huggingface.co/InferenceIllusionist/Excalibur-7b) fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
 * This is a quick experiment to determine the impact of DPO finetuning on the original base model
 * Ran for a little over an hour on a single A100

 * [Excalibur-7b](https://huggingface.co/InferenceIllusionist/Excalibur-7b) fine-tuned with Direct Preference Optimization (DPO) using Intel/orca_dpo_pairs
 * This is a quick experiment to determine the impact of DPO finetuning on the original base model
 * Ran for a little over an hour on a single A100
+* Internal benchmarks showed improvement over base model, awaiting final results