Text Generation
Transformers
GGUF
English
mistral
conversational
TheBloke commited on
Commit
52e91b5
1 Parent(s): e860f14

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -51
README.md CHANGED
@@ -346,13 +346,15 @@ And thank you again to a16z for their generous grant.
346
 
347
  ## OpenHermes x Notus x Neural
348
 
 
 
349
  This is an RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
350
 
351
  DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
352
 
353
- # Training Details
354
 
355
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
356
 
357
  DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
358
 
@@ -408,84 +410,82 @@ In LM-Studio, simply select the ChatML Prefix on the settings side pane:
408
  ```
409
  | Task |Version| Metric |Value | |Stderr|
410
  |------------------------------|------:|--------|-----:|---|-----:|
411
- |agieval_aqua_rat | 0|acc |0.2480|_ |0.0272|
412
- | | |acc_norm|0.2520|_ |0.0273|
413
- |agieval_logiqa_en | 0|acc |0.3810|_ |0.0190|
414
- | | |acc_norm|0.3856|_ |0.0191|
415
- |agieval_lsat_ar | 0|acc |0.2348|_ |0.0280|
416
- | | |acc_norm|0.2304|_ |0.0278|
417
- |agieval_lsat_lr | 0|acc |0.5118|_ |0.0222|
418
- | | |acc_norm|0.5196|_ |0.0221|
419
  |agieval_lsat_rc | 0|acc |0.5948|_ |0.0300|
420
- | | |acc_norm|0.5688|_ |0.0303|
421
- |agieval_sat_en | 0|acc |0.7427|_ |0.0305|
422
- | | |acc_norm|0.7427|_ |0.0305|
423
- |agieval_sat_en_without_passage| 0|acc |0.4563|_ |0.0348|
424
- | | |acc_norm|0.4515|_ |0.0348|
425
- |agieval_sat_math | 0|acc |0.3818|_ |0.0328|
426
- | | |acc_norm|0.3682|_ |0.0326|
427
  ```
428
 
429
- Average: 0.4399
430
 
431
  ## BigBench Hard
432
 
433
  ```
434
- hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
435
  | Task |Version| Metric |Value | |Stderr|
436
  |------------------------------------------------|------:|---------------------|-----:|---|-----:|
437
- |bigbench_causal_judgement | 0|multiple_choice_grade|0.5632|_ |0.0361|
438
- |bigbench_date_understanding | 0|multiple_choice_grade|0.6612|_ |0.0247|
439
  |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3566|_ |0.0299|
440
  |bigbench_geometric_shapes | 0|multiple_choice_grade|0.2006|_ |0.0212|
441
- | | |exact_str_match |0.0334|_ |0.0095|
442
- |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.3020|_ |0.0206|
443
- |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2086|_ |0.0154|
444
- |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.5033|_ |0.0289|
445
- |bigbench_movie_recommendation | 0|multiple_choice_grade|0.4220|_ |0.0221|
446
  |bigbench_navigate | 0|multiple_choice_grade|0.5000|_ |0.0158|
447
- |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.7035|_ |0.0102|
448
- |bigbench_ruin_names | 0|multiple_choice_grade|0.4107|_ |0.0233|
449
- |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2154|_ |0.0130|
450
- |bigbench_snarks | 0|multiple_choice_grade|0.7127|_ |0.0337|
451
- |bigbench_sports_understanding | 0|multiple_choice_grade|0.6988|_ |0.0146|
452
- |bigbench_temporal_sequences | 0|multiple_choice_grade|0.4670|_ |0.0158|
453
- |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2072|_ |0.0115|
454
- |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1731|_ |0.0090|
455
- |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.5033|_ |0.0289|
456
  ```
457
 
458
- Average: 0.4338
459
 
460
  ## GPT4All
461
 
462
  ```
463
  | Task |Version| Metric |Value | |Stderr|
464
  |-------------|------:|--------|-----:|---|-----:|
465
- |arc_challenge| 0|acc |0.5930|_ |0.0144|
466
- | | |acc_norm|0.6323|_ |0.0141|
467
- |arc_easy | 0|acc |0.8443|_ |0.0074|
468
- | | |acc_norm|0.8295|_ |0.0077|
469
  |boolq | 1|acc |0.8599|_ |0.0061|
470
- |hellaswag | 0|acc |0.6548|_ |0.0047|
471
- | | |acc_norm|0.8365|_ |0.0037|
472
- |openbookqa | 0|acc |0.3520|_ |0.0214|
473
- | | |acc_norm|0.4640|_ |0.0223|
474
- |piqa | 0|acc |0.8210|_ |0.0089|
475
- | | |acc_norm|0.8335|_ |0.0087|
476
- |winogrande | 0|acc |0.7466|_ |0.0122|
477
  ```
478
 
479
- Average: 0.7431
480
 
481
  ## TruthfulQA
482
 
483
  ```
484
- hf-causal-experimental (pretrained=openaccess-ai-collective/dpopenhermes-alpha-v1,dtype=bfloat16,trust_remote_code=True,use_accelerate=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
485
  | Task |Version|Metric|Value | |Stderr|
486
  |-------------|------:|------|-----:|---|-----:|
487
- |truthfulqa_mc| 1|mc1 |0.4186|_ |0.0173|
488
- | | |mc2 |0.5847|_ |0.0153|
489
  ```
490
 
491
  <!-- original-model-card end -->
 
346
 
347
  ## OpenHermes x Notus x Neural
348
 
349
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
350
+
351
  This is an RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [argilla/ultrafeedback-binarized-preferences](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
352
 
353
  DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
354
 
355
+ Errata: Due to an issue with the DPO-only version failing to generate an eos token, this model was additional SFT with 7000 rows from the openhermes dataset to teach the model to use the eos_token again to end the turn. This resulted in lower benchmark scores. You can find the original DPO-only model in the `dpo-v0` branch.
356
 
357
+ # Training Details
358
 
359
  DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
360
 
 
410
  ```
411
  | Task |Version| Metric |Value | |Stderr|
412
  |------------------------------|------:|--------|-----:|---|-----:|
413
+ |agieval_aqua_rat | 0|acc |0.2559|_ |0.0274|
414
+ | | |acc_norm|0.2598|_ |0.0276|
415
+ |agieval_logiqa_en | 0|acc |0.3733|_ |0.0190|
416
+ | | |acc_norm|0.3886|_ |0.0191|
417
+ |agieval_lsat_ar | 0|acc |0.2522|_ |0.0287|
418
+ | | |acc_norm|0.2522|_ |0.0287|
419
+ |agieval_lsat_lr | 0|acc |0.5137|_ |0.0222|
420
+ | | |acc_norm|0.5294|_ |0.0221|
421
  |agieval_lsat_rc | 0|acc |0.5948|_ |0.0300|
422
+ | | |acc_norm|0.5725|_ |0.0302|
423
+ |agieval_sat_en | 0|acc |0.7379|_ |0.0307|
424
+ | | |acc_norm|0.7282|_ |0.0311|
425
+ |agieval_sat_en_without_passage| 0|acc |0.4466|_ |0.0347|
426
+ | | |acc_norm|0.4466|_ |0.0347|
427
+ |agieval_sat_math | 0|acc |0.3909|_ |0.0330|
428
+ | | |acc_norm|0.3591|_ |0.0324|
429
  ```
430
 
431
+ Average: 0.4364
432
 
433
  ## BigBench Hard
434
 
435
  ```
 
436
  | Task |Version| Metric |Value | |Stderr|
437
  |------------------------------------------------|------:|---------------------|-----:|---|-----:|
438
+ |bigbench_causal_judgement | 0|multiple_choice_grade|0.5684|_ |0.0360|
439
+ |bigbench_date_understanding | 0|multiple_choice_grade|0.6667|_ |0.0246|
440
  |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.3566|_ |0.0299|
441
  |bigbench_geometric_shapes | 0|multiple_choice_grade|0.2006|_ |0.0212|
442
+ | | |exact_str_match |0.0724|_ |0.0137|
443
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2980|_ |0.0205|
444
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2071|_ |0.0153|
445
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.5067|_ |0.0289|
446
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|0.4140|_ |0.0220|
447
  |bigbench_navigate | 0|multiple_choice_grade|0.5000|_ |0.0158|
448
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.6980|_ |0.0103|
449
+ |bigbench_ruin_names | 0|multiple_choice_grade|0.4174|_ |0.0233|
450
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2044|_ |0.0128|
451
+ |bigbench_snarks | 0|multiple_choice_grade|0.7238|_ |0.0333|
452
+ |bigbench_sports_understanding | 0|multiple_choice_grade|0.6876|_ |0.0148|
453
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|0.4360|_ |0.0157|
454
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2112|_ |0.0115|
455
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1754|_ |0.0091|
456
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.5067|_ |0.0289|
457
  ```
458
 
459
+ Average: 0.4321
460
 
461
  ## GPT4All
462
 
463
  ```
464
  | Task |Version| Metric |Value | |Stderr|
465
  |-------------|------:|--------|-----:|---|-----:|
466
+ |arc_challenge| 0|acc |0.5862|_ |0.0144|
467
+ | | |acc_norm|0.6297|_ |0.0141|
468
+ |arc_easy | 0|acc |0.8472|_ |0.0074|
469
+ | | |acc_norm|0.8321|_ |0.0077|
470
  |boolq | 1|acc |0.8599|_ |0.0061|
471
+ |hellaswag | 0|acc |0.6520|_ |0.0048|
472
+ | | |acc_norm|0.8357|_ |0.0037|
473
+ |openbookqa | 0|acc |0.3440|_ |0.0213|
474
+ | | |acc_norm|0.4580|_ |0.0223|
475
+ |piqa | 0|acc |0.8199|_ |0.0090|
476
+ | | |acc_norm|0.8319|_ |0.0087|
477
+ |winogrande | 0|acc |0.7482|_ |0.0122|
478
  ```
479
 
480
+ Average: 0.7422
481
 
482
  ## TruthfulQA
483
 
484
  ```
 
485
  | Task |Version|Metric|Value | |Stderr|
486
  |-------------|------:|------|-----:|---|-----:|
487
+ |truthfulqa_mc| 1|mc1 |0.3941|_ |0.0171|
488
+ | | |mc2 |0.5698|_ |0.0154|
489
  ```
490
 
491
  <!-- original-model-card end -->