--- license: mit datasets: - mlabonne/orpo-dpo-mix-40k --- This is a uncenscored version of Phi-3. Abliterated using the following the guide here: https://huggingface.co/blog/mlabonne/abliteration Then it was fine tuned on orpo-dpo-mix-40k [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml base_model: cowWhySo/Phi-3-mini-4k-instruct-Friendly trust_remote_code: true model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer chat_template: phi_3 load_in_8bit: false load_in_4bit: true strict: false save_safetensors: true rl: dpo datasets: - path: mlabonne/orpo-dpo-mix-40k split: train type: chatml.intel dataset_prepared_path: val_set_size: 0.0 output_dir: ./out sequence_len: 4096 sample_packing: false pad_to_sequence_len: false adapter: qlora lora_model_dir: lora_r: 64 lora_alpha: 32 lora_dropout: 0.1 lora_target_linear: true lora_fan_in_fan_out: wandb_project: axolotl wandb_entity: wandb_watch: wandb_name: phi3-mini-4k-instruct-Friendly wandb_log_model: gradient_accumulation_steps: 8 micro_batch_size: 4 num_epochs: 1 optimizer: paged_adamw_8bit lr_scheduler: linear learning_rate: 5e-6 train_on_inputs: false group_by_length: false bf16: auto gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: True early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 150 evals_per_epoch: 0 eval_table_size: eval_table_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: deepspeed_configs/zero3.json weight_decay: 0.01 max_grad_norm: 1.0 resize_token_embeddings_to_32x: true ```

## Quants GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf ## Benchmarks | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |--------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[Phi-3-mini-4k-instruct-Friendly](https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly)| 41| 67.56| 46.36| 39.3| 48.56| ### AGIEval | Task |Version| Metric |Value| |Stderr| |------------------------------|------:|--------|----:|---|-----:| |agieval_aqua_rat | 0|acc |22.05|± | 2.61| | | |acc_norm|22.05|± | 2.61| |agieval_logiqa_en | 0|acc |41.01|± | 1.93| | | |acc_norm|41.32|± | 1.93| |agieval_lsat_ar | 0|acc |22.17|± | 2.75| | | |acc_norm|22.17|± | 2.75| |agieval_lsat_lr | 0|acc |45.69|± | 2.21| | | |acc_norm|45.88|± | 2.21| |agieval_lsat_rc | 0|acc |59.48|± | 3.00| | | |acc_norm|56.51|± | 3.03| |agieval_sat_en | 0|acc |75.24|± | 3.01| | | |acc_norm|70.39|± | 3.19| |agieval_sat_en_without_passage| 0|acc |39.81|± | 3.42| | | |acc_norm|37.86|± | 3.39| |agieval_sat_math | 0|acc |33.64|± | 3.19| | | |acc_norm|31.82|± | 3.15| Average: 41.0% ### GPT4All | Task |Version| Metric |Value| |Stderr| |-------------|------:|--------|----:|---|-----:| |arc_challenge| 0|acc |49.74|± | 1.46| | | |acc_norm|50.43|± | 1.46| |arc_easy | 0|acc |76.68|± | 0.87| | | |acc_norm|73.23|± | 0.91| |boolq | 1|acc |79.27|± | 0.71| |hellaswag | 0|acc |57.91|± | 0.49| | | |acc_norm|77.13|± | 0.42| |openbookqa | 0|acc |35.00|± | 2.14| | | |acc_norm|43.80|± | 2.22| |piqa | 0|acc |77.86|± | 0.97| | | |acc_norm|79.54|± | 0.94| |winogrande | 0|acc |69.53|± | 1.29| Average: 67.56% ### TruthfulQA | Task |Version|Metric|Value| |Stderr| |-------------|------:|------|----:|---|-----:| |truthfulqa_mc| 1|mc1 |31.21|± | 1.62| | | |mc2 |46.36|± | 1.55| Average: 46.36% ### Bigbench | Task |Version| Metric |Value| |Stderr| |------------------------------------------------|------:|---------------------|----:|---|-----:| |bigbench_causal_judgement | 0|multiple_choice_grade|54.74|± | 3.62| |bigbench_date_understanding | 0|multiple_choice_grade|66.67|± | 2.46| |bigbench_disambiguation_qa | 0|multiple_choice_grade|29.46|± | 2.84| |bigbench_geometric_shapes | 0|multiple_choice_grade|11.98|± | 1.72| | | |exact_str_match | 0.00|± | 0.00| |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|28.00|± | 2.01| |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|17.14|± | 1.43| |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|45.67|± | 2.88| |bigbench_movie_recommendation | 0|multiple_choice_grade|24.40|± | 1.92| |bigbench_navigate | 0|multiple_choice_grade|53.70|± | 1.58| |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|68.10|± | 1.04| |bigbench_ruin_names | 0|multiple_choice_grade|31.03|± | 2.19| |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|15.93|± | 1.16| |bigbench_snarks | 0|multiple_choice_grade|77.35|± | 3.12| |bigbench_sports_understanding | 0|multiple_choice_grade|52.64|± | 1.59| |bigbench_temporal_sequences | 0|multiple_choice_grade|51.50|± | 1.58| |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|19.52|± | 1.12| |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|13.89|± | 0.83| |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|45.67|± | 2.88| Average: 39.3% Average score: 48.56% ## Training Summary ```json { "train/loss": 0.299, "train/grad_norm": 0.9337566701340533, "train/learning_rate": 0, "train/rewards/chosen": 0.08704188466072083, "train/rewards/rejected": -2.835820436477661, "train/rewards/accuracies": 0.84375, "train/rewards/margins": 2.9228620529174805, "train/logps/rejected": -509.9840393066406, "train/logps/chosen": -560.8234252929688, "train/logits/rejected": 1.6356163024902344, "train/logits/chosen": 1.7323706150054932, "train/epoch": 1.002169197396963, "train/global_step": 231, "_timestamp": 1717711643.3345022, "_runtime": 22808.557655334473, "_step": 231, "train_runtime": 22809.152, "train_samples_per_second": 1.944, "train_steps_per_second": 0.01, "total_flos": 0, "train_loss": 0.44557410065745895, "_wandb": { "runtime": 22810 } } ```