metadata

library_name: transformers
license: mit
datasets:
  - teknium/OpenHermes-2.5
  - HuggingFaceH4/ultrafeedback_binarized
  - argilla/distilabel-intel-orca-dpo-pairs
  - jondurbin/py-dpo-v0.1
  - argilla/distilabel-math-preference-dpo
pipeline_tag: text-generation

Phi-1.5

The language model Phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters.

Phi-1_5-Instruct-v0.1

The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for instruction following. I used the trl library and a single A100 40GB GPU during both the SFT and DPO steps.

Supervised Fine-Tuning
- Used 128,000 instruction, response pairs from the teknium/OpenHermes-2.5 dataset
Direct Preference Optimization (DPO)
- Used a combination of the following preference datasets