Edit model card

Model Card for decruz07/kellemar-DPO-Orca-Distilled-7B

This model was created using OpenHermes-2.5 as the base, and finetuned with argilla/distilabel-intel-orca-dpo-pairs

Model Details

Finetuned with these specific parameters: Steps: 200 Learning Rate: 5e5 Beta: 0.1

Model Description

  • Developed by: @decruz
  • Funded by [optional]: my full-time job
  • Finetuned from model [optional]: teknium/OpenHermes-2.5-Mistral-7B

Benchmarks

OpenLLM

Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
68.32 65.78 85.04 63.24 55.54 78.69 61.64

Nous

Model AGIEval GPT4All TruthfulQA Bigbench Average
kellemar-DPO-Orca-Distilled-7B 43.61 73.14 55.73 42.28 53.69
kellemar-Orca-DPO-7B 43.35 73.43 54.02 42.24 53.26
OpenHermes-2.5-Mistral-7B 43.07 73.12 53.04 40.96 52.38

Uses

You can use this for basic inference. You could probably finetune with this if you want to.

How to Get Started with the Model

You can create a space out of this, or use basic python code to call the model directly and make inferences to it.

[More Information Needed]

Training Details

The following was used: `training_args = TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=4, gradient_checkpointing=True, learning_rate=5e-5, lr_scheduler_type="cosine", max_steps=200, save_strategy="no", logging_steps=1, output_dir=new_model, optim="paged_adamw_32bit", warmup_steps=100, bf16=True, report_to="wandb", )

Create DPO trainer

dpo_trainer = DPOTrainer( model, ref_model, args=training_args, train_dataset=dataset, tokenizer=tokenizer, peft_config=peft_config, beta=0.1, max_prompt_length=1024, max_length=1536, )`

Training Data

This was trained with https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs

Training Procedure

Trained with Labonne's Google Colab Notebook on Finetuning Mistral 7B with DPO.

Model Card Authors [optional]

@decruz

Model Card Contact

@decruz on X/Twitter

Downloads last month
7
Safetensors
Model size
7.24B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train decruz07/kellemar-DPO-Orca-Distilled-7B