samhog commited on
Commit
7e1a0e3
1 Parent(s): 58d577c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,5 +1,5 @@
1
  # Psychology LLaMA RLAIF 🦙🙋‍♂🤖
2
- This is a LLaMA-7B-based language model trained in the field of psychology using Reinforcement Learning from AI. To learn more about RLAIF, I recommend [this](https://www.anthropic.com/index/constitutional-ai-harmlessness-from-ai-feedback) great, revolutionizing 2022 paper by Anthropic. For some insights in the process of fine-tuning using RLHF, which is a very similar process, there is a great blogpost on Hugging Face found [here!](https://huggingface.co/blog/stackllama)
3
 
4
  **Links**: [Reward model](https://huggingface.co/samhog/RLAIF-psychology-alpaca-rm); [Base model](https://huggingface.co/samhog/psychology-llama-merged)
5
 
 
1
  # Psychology LLaMA RLAIF 🦙🙋‍♂🤖
2
+ This is a LLaMA-7B-based language model trained in the field of psychology using Reinforcement Learning from AI Feedback. To learn more about RLAIF, I recommend [this](https://www.anthropic.com/index/constitutional-ai-harmlessness-from-ai-feedback) great, revolutionizing 2022 paper by Anthropic. For some insights in the process of fine-tuning using RLHF, which is a very similar process, there is a great blogpost on Hugging Face found [here!](https://huggingface.co/blog/stackllama)
3
 
4
  **Links**: [Reward model](https://huggingface.co/samhog/RLAIF-psychology-alpaca-rm); [Base model](https://huggingface.co/samhog/psychology-llama-merged)
5