[Research Log: RLFCV2, Petertodd, the paperclip maximizer](https://www.lesswrong.com/posts/doLkRMasXMKyafJrz/research-log-rlfcv2-training-phi-1-5-gpt2xl-and-falcon-rw-1b) Note: these version of training has a modification - instead of paperclips... all words/tokens pertaining to paperclips were turned to "stamps."