Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,10 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
datasets: pt-sk/toxic_classification
|
4 |
-
tags:
|
|
|
|
|
|
|
5 |
---
|
6 |
Aligning the model using Proximal Policy Optimization (PPO). The goal is to train the model to generate non-toxic reviews. The training process utilizes the `trl` library for reinforcement learning, the `transformers` library for model handling, and `datasets` for dataset management.
|
7 |
Implementation code is available here: [GitHub](https://github.com/sathishkumar67/GPT-2-Non-Toxic-RLHF)
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
datasets: pt-sk/toxic_classification
|
4 |
+
tags:
|
5 |
+
- PPO
|
6 |
+
- RLHF
|
7 |
+
pipeline_tag: text-generation
|
8 |
---
|
9 |
Aligning the model using Proximal Policy Optimization (PPO). The goal is to train the model to generate non-toxic reviews. The training process utilizes the `trl` library for reinforcement learning, the `transformers` library for model handling, and `datasets` for dataset management.
|
10 |
Implementation code is available here: [GitHub](https://github.com/sathishkumar67/GPT-2-Non-Toxic-RLHF)
|