Psychology LLaMA RLAIF 🦙🙋‍♂🤖

This is a LLaMA-7B-based language model trained in the field of psychology using Reinforcement Learning from AI Feedback. To learn more about RLAIF, I recommend this great, revolutionizing 2022 paper by Anthropic. For some insights in the process of fine-tuning using RLHF, which is a very similar process, there is a great blogpost on Hugging Face found here!

Links: Reward model; Base model

Background 💡

This model was developed as part of a thesis project in the field of machine learning and psychology. The goal of the thesis was to compare reinforcement learning from human feedback and AI feedback. Evaluation showed that the model performed significantly better (avg. score [out of 4] 2.98) than the base model (1.22), and not significantly worse than ChatGPT (3.20). Further, the evaluation found no significant difference between the RLAHF model (2.70). It was trained on a total of 2.000 data points for 4 hours on a single A100 GPU through Google Colab. Even though this model sometimes outputs appropriate answers, it suffers from The Repetition Problem.

Paper 📜

"Comparison Between RLHF and RLAIF in Fine-Tuning a Large Language Model"

The paper can be found here!

Usage 🏂

As a base model, it is recommended to use the samhog/psychology-alpaca-merged. Note that this combination does produce some answers suffering from the repetition problem, but not as frequently as the samhog/psychology-llama-merged does.

from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig

tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")

model = LLaMAForCausalLM.from_pretrained(
    "samhog/psychology-alpaca-merged",
    load_in_8bit=True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, "samhog/psychology-llama-rlaif")

Authors: Samuel Höglund, samhog@kth.se; Josef Khedri, jkhedri@kth.se