Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
# Psychology LLaMA RLAIF 🦙🙋♂🤖
|
2 |
-
This is a LLaMA-7B-based language model trained in the field of psychology using Reinforcement Learning from AI. To learn more about RLAIF, I recommend [this](https://www.anthropic.com/index/constitutional-ai-harmlessness-from-ai-feedback) great, revolutionizing 2022 paper by Anthropic. For some insights in the process of fine-tuning using RLHF, which is a very similar process, there is a great blogpost on Hugging Face found [here!](https://huggingface.co/blog/stackllama)
|
3 |
|
4 |
**Links**: [Reward model](https://huggingface.co/samhog/RLAIF-psychology-alpaca-rm); [Base model](https://huggingface.co/samhog/psychology-llama-merged)
|
5 |
|
|
|
1 |
# Psychology LLaMA RLAIF 🦙🙋♂🤖
|
2 |
+
This is a LLaMA-7B-based language model trained in the field of psychology using Reinforcement Learning from AI Feedback. To learn more about RLAIF, I recommend [this](https://www.anthropic.com/index/constitutional-ai-harmlessness-from-ai-feedback) great, revolutionizing 2022 paper by Anthropic. For some insights in the process of fine-tuning using RLHF, which is a very similar process, there is a great blogpost on Hugging Face found [here!](https://huggingface.co/blog/stackllama)
|
3 |
|
4 |
**Links**: [Reward model](https://huggingface.co/samhog/RLAIF-psychology-alpaca-rm); [Base model](https://huggingface.co/samhog/psychology-llama-merged)
|
5 |
|