Safetensors
English
amirabdullah19852020 commited on
Commit
9cbba88
1 Parent(s): fa07d8a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - unalignment/toxic-dpo-v0.2
5
+ - Anthropic/hh-rlhf
6
+ - stanfordnlp/imdb
7
+ language:
8
+ - en
9
+ ---
10
+
11
+ We train a collection of models under RLHF on the above datasets. We use DPO for hh-rlhf and unalignment, and train a PPO on completing IMDB prefixes with positive sentiment.