samhog commited on
Commit
0451f0f
β€’
1 Parent(s): ed5f06a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -2
README.md CHANGED
@@ -6,8 +6,31 @@ datasets:
6
  # Psychology Alpaca 🍩
7
  This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.
8
 
9
- ### Background
10
- This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from *human feedback* and *AI feedback*. The paper can be found [here](https://www.diva-portal.org/smash/record.jsf?aq2=%5B%5B%5D%5D&c=2&af=%5B%5D&searchType=SIMPLE&sortOrder2=title_sort_asc&query=rlhf&language=en&pid=diva2%3A1782683&aq=%5B%5B%5D%5D&sf=undergraduate&aqe=%5B%5D&sortOrder=author_sort_asc&onlyFullText=false&noOfRows=50&dswid=6661)!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  **Links**: [RLHF model](https://huggingface.co/samhog/psychology-llama-rlhf); [RLAIF model](https://huggingface.co/samhog/psychology-llama-rlaif)
13
 
 
6
  # Psychology Alpaca 🍩
7
  This is a LLaMA-7B language model trained on 10.000 psychology-related prompts and answers generated by ChatGPT. The model was trained on a single A100 GPU from Google Colab. The model shows some knowledge in the field of psychology and generally performs better than its base model parent.
8
 
9
+ ### Background πŸ’‘
10
+ This model was developed as part of a thesis project in the field of machine learning and psychology. It was used as a base model for further fine-tuning using reinforcement learning. The goal of the thesis was to compare reinforcement learning from *human feedback* and *AI feedback*.
11
+
12
+ ### Paper πŸ“œ
13
+ "Comparison Between RLHF and RLAIF in Fine-Tuning a Large Language Model"
14
+
15
+ The paper can be found [here](https://www.diva-portal.org/smash/record.jsf?dswid=3835&pid=diva2%3A1782683&c=2&searchType=SIMPLE&language=en&query=rlhf&af=%5B%5D&aq=%5B%5B%5D%5D&aq2=%5B%5B%5D%5D&aqe=%5B%5D&noOfRows=50&sortOrder=author_sort_asc&sortOrder2=title_sort_asc&onlyFullText=false&sf=undergraduate)!
16
+
17
+ ### Usage πŸ‚
18
+ ```
19
+ from peft import PeftModel
20
+ from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig
21
+
22
+ tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")
23
+
24
+ # Load model weights
25
+ model = LLaMAForCausalLM.from_pretrained(
26
+ "decapoda-research/llama-7b-hf",
27
+ load_in_8bit=True,
28
+ device_map="auto",
29
+ )
30
+
31
+ # Add Peft layer to initial weights in order to get the Psychology Alpaca weights
32
+ model = PeftModel.from_pretrained(model, "kth/psychology-alpaca")
33
+ ```
34
 
35
  **Links**: [RLHF model](https://huggingface.co/samhog/psychology-llama-rlhf); [RLAIF model](https://huggingface.co/samhog/psychology-llama-rlaif)
36