---
license: mit
datasets:
- mlabonne/orpo-dpo-mix-40k
---

This is a uncenscored version of Phi-3.

Abliterated using the following the guide here: https://huggingface.co/blog/mlabonne/abliteration

Then it was fine tuned on orpo-dpo-mix-40k

[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details><summary>See axolotl config</summary>

axolotl version: `0.4.0`
```yaml
base_model: cowWhySo/Phi-3-mini-4k-instruct-Friendly
trust_remote_code: true
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
chat_template: phi_3

load_in_8bit: false
load_in_4bit: true
strict: false
save_safetensors: true

rl: dpo
datasets:
  - path: mlabonne/orpo-dpo-mix-40k
    split: train
    type: chatml.intel

dataset_prepared_path:
val_set_size: 0.0
output_dir: ./out

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: false

adapter: qlora
lora_model_dir:

lora_r: 64
lora_alpha: 32
lora_dropout: 0.1
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_name: phi3-mini-4k-instruct-Friendly
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 4
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: linear
learning_rate: 5e-6
train_on_inputs: false
group_by_length: false

bf16: auto

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: True
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 150
evals_per_epoch: 0
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero3.json
weight_decay: 0.01
max_grad_norm: 1.0
resize_token_embeddings_to_32x: true
```

</details><br>


## Quants

GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf

## Benchmarks

|                                              Model                                               |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|--------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|[Phi-3-mini-4k-instruct-Friendly](https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly)|     41|  67.56|     46.36|    39.3|  48.56|

### AGIEval
|             Task             |Version| Metric |Value|   |Stderr|
|------------------------------|------:|--------|----:|---|-----:|
|agieval_aqua_rat              |      0|acc     |22.05|±  |  2.61|
|                              |       |acc_norm|22.05|±  |  2.61|
|agieval_logiqa_en             |      0|acc     |41.01|±  |  1.93|
|                              |       |acc_norm|41.32|±  |  1.93|
|agieval_lsat_ar               |      0|acc     |22.17|±  |  2.75|
|                              |       |acc_norm|22.17|±  |  2.75|
|agieval_lsat_lr               |      0|acc     |45.69|±  |  2.21|
|                              |       |acc_norm|45.88|±  |  2.21|
|agieval_lsat_rc               |      0|acc     |59.48|±  |  3.00|
|                              |       |acc_norm|56.51|±  |  3.03|
|agieval_sat_en                |      0|acc     |75.24|±  |  3.01|
|                              |       |acc_norm|70.39|±  |  3.19|
|agieval_sat_en_without_passage|      0|acc     |39.81|±  |  3.42|
|                              |       |acc_norm|37.86|±  |  3.39|
|agieval_sat_math              |      0|acc     |33.64|±  |  3.19|
|                              |       |acc_norm|31.82|±  |  3.15|

Average: 41.0%

### GPT4All
|    Task     |Version| Metric |Value|   |Stderr|
|-------------|------:|--------|----:|---|-----:|
|arc_challenge|      0|acc     |49.74|±  |  1.46|
|             |       |acc_norm|50.43|±  |  1.46|
|arc_easy     |      0|acc     |76.68|±  |  0.87|
|             |       |acc_norm|73.23|±  |  0.91|
|boolq        |      1|acc     |79.27|±  |  0.71|
|hellaswag    |      0|acc     |57.91|±  |  0.49|
|             |       |acc_norm|77.13|±  |  0.42|
|openbookqa   |      0|acc     |35.00|±  |  2.14|
|             |       |acc_norm|43.80|±  |  2.22|
|piqa         |      0|acc     |77.86|±  |  0.97|
|             |       |acc_norm|79.54|±  |  0.94|
|winogrande   |      0|acc     |69.53|±  |  1.29|

Average: 67.56%

### TruthfulQA
|    Task     |Version|Metric|Value|   |Stderr|
|-------------|------:|------|----:|---|-----:|
|truthfulqa_mc|      1|mc1   |31.21|±  |  1.62|
|             |       |mc2   |46.36|±  |  1.55|

Average: 46.36%

### Bigbench
|                      Task                      |Version|       Metric        |Value|   |Stderr|
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|bigbench_causal_judgement                       |      0|multiple_choice_grade|54.74|±  |  3.62|
|bigbench_date_understanding                     |      0|multiple_choice_grade|66.67|±  |  2.46|
|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|29.46|±  |  2.84|
|bigbench_geometric_shapes                       |      0|multiple_choice_grade|11.98|±  |  1.72|
|                                                |       |exact_str_match      | 0.00|±  |  0.00|
|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|28.00|±  |  2.01|
|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|17.14|±  |  1.43|
|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|45.67|±  |  2.88|
|bigbench_movie_recommendation                   |      0|multiple_choice_grade|24.40|±  |  1.92|
|bigbench_navigate                               |      0|multiple_choice_grade|53.70|±  |  1.58|
|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|68.10|±  |  1.04|
|bigbench_ruin_names                             |      0|multiple_choice_grade|31.03|±  |  2.19|
|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|15.93|±  |  1.16|
|bigbench_snarks                                 |      0|multiple_choice_grade|77.35|±  |  3.12|
|bigbench_sports_understanding                   |      0|multiple_choice_grade|52.64|±  |  1.59|
|bigbench_temporal_sequences                     |      0|multiple_choice_grade|51.50|±  |  1.58|
|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|19.52|±  |  1.12|
|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|13.89|±  |  0.83|
|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|45.67|±  |  2.88|

Average: 39.3%

Average score: 48.56%

## Training Summary

```json
{
  "train/loss": 0.299,
  "train/grad_norm": 0.9337566701340533,
  "train/learning_rate": 0,
  "train/rewards/chosen": 0.08704188466072083,
  "train/rewards/rejected": -2.835820436477661,
  "train/rewards/accuracies": 0.84375,
  "train/rewards/margins": 2.9228620529174805,
  "train/logps/rejected": -509.9840393066406,
  "train/logps/chosen": -560.8234252929688,
  "train/logits/rejected": 1.6356163024902344,
  "train/logits/chosen": 1.7323706150054932,
  "train/epoch": 1.002169197396963,
  "train/global_step": 231,
  "_timestamp": 1717711643.3345022,
  "_runtime": 22808.557655334473,
  "_step": 231,
  "train_runtime": 22809.152,
  "train_samples_per_second": 1.944,
  "train_steps_per_second": 0.01,
  "total_flos": 0,
  "train_loss": 0.44557410065745895,
  "_wandb": {
    "runtime": 22810
  }
}
```