Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: bigscience-openrail-m
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
GPT-J-Pyg_PPO-6B [GPT-J Pygmalion Dev V8p4 + GPT-J PPO_HH]
|
7 |
+
|
8 |
+
GPT-J-Pyg_PPO-6B is an experimental model containing a parameter-wise 50/50 blend (weighted average) of the weights of ppo_hh_gpt-j and Pygmalion-6b Dev V8p4.
|
9 |
+
|
10 |
+
-Intended Merge Value-
|
11 |
+
|
12 |
+
As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs.
|
13 |
+
Pyg_PPO combines ppo_hh_gpt-j and Pygmalion-6b; both technical
|
14 |
+
achievements are blended with the intent to elevate the strengths of
|
15 |
+
both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have
|
16 |
+
the largest impact on the usefulness of a model without the expense of
|
17 |
+
fine-tuning. Blend was done in FP32 and output in FP16.
|
18 |
+
|
19 |
+
-Intended Use-
|
20 |
+
|
21 |
+
Research purposes only, intended for responsible use.
|
22 |
+
Express a conversation in natural language, and Pyg_PPO will do the thing.
|
23 |
+
Try starting a two line prompt such as:
|
24 |
+
```
|
25 |
+
Bot: "Hello, how are you?"
|
26 |
+
You: "I am doing just fine, thank you."
|
27 |
+
```
|
28 |
+
Or any other
|
29 |
+
topic, and the model will carry on in this back and forth format.
|
30 |
+
|
31 |
+
Can also be used as a base to merge with other creative,
|
32 |
+
technical, or adventure themed models of the same class
|
33 |
+
(GPT-J & 6b NeoX) and parameter size (6b) to experiment with
|
34 |
+
the morphology of model weights based on the value added
|
35 |
+
by instruct.
|
36 |
+
|
37 |
+
Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.9, Temperature at 0.6, and Repetition Penalty at 1.1; extra samplers
|
38 |
+
disabled.
|
39 |
+
|
40 |
+
-Credits To-
|
41 |
+
|
42 |
+
Core Model:
|
43 |
+
https://huggingface.co/EleutherAI/gpt-j-6B
|
44 |
+
Author:
|
45 |
+
https://www.eleuther.ai/
|
46 |
+
|
47 |
+
Model1; 50% ppo_hh_gpt-j:
|
48 |
+
https://huggingface.co/reciprocate/ppo_hh_gpt-j
|
49 |
+
|
50 |
+
Author Repo:
|
51 |
+
https://huggingface.co/reciprocate
|
52 |
+
|
53 |
+
Related; CarperAI:
|
54 |
+
https://huggingface.co/CarperAI
|
55 |
+
|
56 |
+
Dataset is a variant of the Helpful Harmless assistant themed
|
57 |
+
dataset and Proximal Policy Optimization, specific datasets
|
58 |
+
used are unknown; listed repo datasets include:
|
59 |
+
https://huggingface.co/datasets/reciprocate/summarize_eval_ilql
|
60 |
+
https://huggingface.co/datasets/reciprocate/hh_eval_ilql
|
61 |
+
|
62 |
+
PPO explained:
|
63 |
+
https://paperswithcode.com/method/ppo
|
64 |
+
Potential HH-type datasets utilized:
|
65 |
+
https://huggingface.co/HuggingFaceH4
|
66 |
+
https://huggingface.co/datasets/Anthropic/hh-rlhf
|
67 |
+
|
68 |
+
Model2; 50% Pygmalion-6b:
|
69 |
+
https://huggingface.co/PygmalionAI/pygmalion-6b
|
70 |
+
|
71 |
+
Author Repo:
|
72 |
+
https://huggingface.co/PygmalionAI
|
73 |
+
|
74 |
+
Weight merge Script credit to Concedo:
|
75 |
+
https://huggingface.co/concedo
|
76 |
+
|
77 |
+
Model's card template credit to Digitous:
|
78 |
+
https://huggingface.co/digitous/GPT-R
|