DPOpenHermes-7B / README.md
winglian's picture
Update README.md
c48c842
|
raw
history blame
No virus
3.68 kB
metadata
base_model: teknium/OpenHermes-2.5-Mistral-7B
license: apache-2.0
datasets:
  - teknium/openhermes
  - argilla/ultrafeedback-binarized-preferences
  - Intel/orca_dpo_pairs
language:
  - en
library_name: transformers
pipeline_tag: text-generation

DPOpenHermes 7B

OpenHermes x Notus x Neural

This is an RL fine tuned OpenHermes-2.5-Mistral-7B using the Intel/orca_dpo_pairs and argilla/ultrafeedback-binarized-preferences preference datasets for reinforcement learning using Direct Preference Optimization (DPO)

DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.

Training Details

Built with Axolotl

DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.

https://wandb.ai/oaaic/openhermes-dpo/reports/DPOpenHermes--Vmlldzo2MTQ3NDg2

Benchmarks

AGIEval

|             Task             |Version| Metric |Value |   |Stderr|
|------------------------------|------:|--------|-----:|---|-----:|
|agieval_aqua_rat              |      0|acc     |0.2480|_  |0.0272|
|                              |       |acc_norm|0.2520|_  |0.0273|
|agieval_logiqa_en             |      0|acc     |0.3810|_  |0.0190|
|                              |       |acc_norm|0.3856|_  |0.0191|
|agieval_lsat_ar               |      0|acc     |0.2348|_  |0.0280|
|                              |       |acc_norm|0.2304|_  |0.0278|
|agieval_lsat_lr               |      0|acc     |0.5118|_  |0.0222|
|                              |       |acc_norm|0.5196|_  |0.0221|
|agieval_lsat_rc               |      0|acc     |0.5948|_  |0.0300|
|                              |       |acc_norm|0.5688|_  |0.0303|
|agieval_sat_en                |      0|acc     |0.7427|_  |0.0305|
|                              |       |acc_norm|0.7427|_  |0.0305|
|agieval_sat_en_without_passage|      0|acc     |0.4563|_  |0.0348|
|                              |       |acc_norm|0.4515|_  |0.0348|
|agieval_sat_math              |      0|acc     |0.3818|_  |0.0328|
|                              |       |acc_norm|0.3682|_  |0.0326|

Average: 0.4399

GPT4All

|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.5930|_  |0.0144|
|             |       |acc_norm|0.6323|_  |0.0141|
|arc_easy     |      0|acc     |0.8443|_  |0.0074|
|             |       |acc_norm|0.8295|_  |0.0077|
|boolq        |      1|acc     |0.8599|_  |0.0061|
|hellaswag    |      0|acc     |0.6548|_  |0.0047|
|             |       |acc_norm|0.8365|_  |0.0037|
|openbookqa   |      0|acc     |0.3520|_  |0.0214|
|             |       |acc_norm|0.4640|_  |0.0223|
|piqa         |      0|acc     |0.8210|_  |0.0089|
|             |       |acc_norm|0.8335|_  |0.0087|
|winogrande   |      0|acc     |0.7466|_  |0.0122|

Average: 0.7431

TruthfulQA

hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
|    Task     |Version|Metric|Value |   |Stderr|
|-------------|------:|------|-----:|---|-----:|
|truthfulqa_mc|      1|mc1   |0.4186|_  |0.0173|
|             |       |mc2   |0.5847|_  |0.0153|