Psychology LLaMA RLHF 🦙🙋‍♂️

This is a LLaMA-7B-based language model trained in the field of psychology using Reinforcement Learning from Human Feedback. To learn more about RLHF, I recommend this great blogpost on Hugging Face. For some insights in the process of fine-tuning using RLHF, there is a great blogpost, also from Hugging Face, found here!

Links: Reward model; Base model

Background 💡

This model was developed as part of a thesis project in the field of machine learning and psychology. The goal of the thesis was to compare reinforcement learning from human feedback and AI feedback. Evaluation showed that the model performed significantly better (avg. score [out of 4] 2.70) than the base model (1.22), but significantly worse than ChatGPT (3.20). Further, the evaluation found no significant difference between the RLAIF model (2.98). It was trained on a total of 2.000 data points for 4 hours on a single A100 GPU through Google Colab. Even though this model sometimes outputs appropriate answers, it suffers from The Repetition Problem.

Paper 📜

"Comparison Between RLHF and RLAIF in Fine-Tuning a Large Language Model"

The paper can be found here!

Usage 🏂

As a base model, it is recommended to use the samhog/psychology-alpaca-merged. Note that this combination does produce some answers suffering from the repetition problem, but not as frequently as the samhog/psychology-llama-merged does.

from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig

tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")

model = LLaMAForCausalLM.from_pretrained(
    "samhog/psychology-alpaca-merged",
    load_in_8bit=True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, "samhog/psychology-llama-rlhf")

Authors: Samuel Höglund, samhog@kth.se; Josef Khedri, jkhedri@kth.se