alvarobartt's picture
alvarobartt HF staff
Update README.md
1e987fc
|
raw
history blame
1.82 kB
metadata
model-index:
  - name: notus-7b-v1-lora-adapter
    results: []
datasets:
  - argilla/ultrafeedback-binarized-avg-rating-for-dpo
language:
  - en
base_model: alignment-handbook/zephyr-7b-sft-full
library_name: peft
pipeline_tag: text-generation
tags:
  - dpo
  - preference
  - ultrafeedback
license: apache-2.0

Model Card for Notus 7B v1 (LoRA Adapters)

Image was artificially generated by Dalle-3 via ChatGPT Pro

Notus is going to be a collection of fine-tuned models using DPO, similarly to Zephyr, but mainly focused on the Direct Preference Optimization (DPO) step, aiming to incorporate preference feedback into the LLMs when fine-tuning those. Notus models are intended to be used as assistants via chat-like applications, and are evaluated with the MT-Bench, AlpacaEval, and LM Evaluation Harness benchmarks, to be directly compared with Zephyr fine-tuned models also using DPO.

Model Details

Model Description

  • Developed by: Argilla, Inc. (based on HuggingFace H4 and MistralAI previous efforts and amazing work)
  • Shared by: Argilla, Inc.
  • Model type: GPT-like 7B model DPO fine-tuned using LoRA
  • Language(s) (NLP): Mainly English
  • License: Apache 2.0 (same as Zephyr 7B SFT and Mistral 7B v0.1)
  • Finetuned from model: alignment-handbook/zephyr-7b-sft-full

Model Sources [optional]

Usage

As the current model only contains the adapters, you will need to use PEFT to merge the adapters into the original model first.