metadata

datasets:
  - prometheus-eval/Feedback-Collection
  - prometheus-eval/Preference-Collection
library_name: transformers
pipeline_tag: text2text-generation
tags:
  - text2text-generation

Links for Reference

Quants for: https://huggingface.co/prometheus-eval/prometheus-7b-v2.0
Homepage: In Progress
Repository: https://github.com/prometheus-eval/prometheus-eval
Paper: https://arxiv.org/abs/2405.01535
Point of Contact: seungone@cmu.edu

TL;DR

Prometheus 2 is an alternative of GPT-4 evaluation when doing fine-grained evaluation of an underlying LLM & a Reward model for Reinforcement Learning from Human Feedback (RLHF).

Prometheus 2 is a language model using Mistral-Instruct as a base model. It is fine-tuned on 100K feedback within the Feedback Collection and 200K feedback within the Preference Collection. It is also made by weight merging to support both absolute grading (direct assessment) and relative grading (pairwise ranking). The surprising thing is that we find weight merging also improves performance on each format.

Model Details

Model Description

Model type: Language model
Language(s) (NLP): English
License: Apache 2.0
Related Models: All Prometheus Checkpoints
Resources for more information:
- Research paper (https://arxiv.org/abs/2405.01535)
- GitHub Repo (https://github.com/prometheus-eval/prometheus-eval)

Prometheus is trained with two different sizes (7B and 8x7B). You could check the 7B sized LM on this page. Also, check out our dataset as well on this page and this page.

Prompt format

We have made wrapper functions and classes to conveniently use Prometheus 2 at our github repository. We highly recommend you use it!

However, if you just want to use the model for your use case, please refer to the prompt format below. Note that absolute grading and relative grading requires different prompt templates and system prompts.

WORK IN PROGRESS