psychology-alpaca / README.md
samhog's picture
Update README.md
0451f0f
metadata
datasets:
  - samhog/psychology-10k

Psychology Alpaca 🍩

This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.

Background πŸ’‘

This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from human feedback and AI feedback.

Paper πŸ“œ

"Comparison Between RLHF and RLAIF in Fine-Tuning a Large Language Model"

The paper can be found here!

Usage πŸ‚

from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig

tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")

# Load model weights
model = LLaMAForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    device_map="auto",
)

# Add Peft layer to initial weights in order to get the Psychology Alpaca weights
model = PeftModel.from_pretrained(model, "kth/psychology-alpaca")

Links: RLHF model; RLAIF model

Authors: Samuel HΓΆglund, samhog@kth.se; Josef Khedri, jkhedri@kth.se