DanielSc4
/

RedPajama-INCITE-Chat-3B-v1-RL-LoRA-8bit-test1

Text Generation

Model card Files Files and versions Community

DanielSc4 commited on Aug 10, 2023

Commit

a2ee88a

•

1 Parent(s): fe340d2

Create README.md

Files changed (1) hide show

README.md +18 -0

README.md ADDED Viewed

	@@ -0,0 +1,18 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+---
+Pre-trained model fine-tuned using Reinforcement Learning on [DIALOCONAN](https://github.com/marcoguerini/CONAN#dialoconan) dataset using [facebook/roberta-hate-speech-dynabench-r4-target](https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target) as reward model.
+Toxicity results on [allenai/real-toxicity-prompts](https://huggingface.co/datasets/allenai/real-toxicity-prompts) dataset using custom prompts (see 🥞[RewardLM](https://github.com/DanielSc4/RewardLM) for details).
+| Toxicity Level | RedPajama-INCITE-Chat-3B |
+|:--------------:|:------------------------:|
+|             Pre-Trained |           0.217          |
+|             [Fine-Tuned](https://huggingface.co/DanielSc4/RedPajama-INCITE-Chat-3B-v1-FT-LoRA-8bit-test1) |           **0.129**          |
+|  **RL (this)** |         0.160        |