TehVenom commited on
Commit
5800468
1 Parent(s): 3865b8d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bigscience-openrail-m
3
+ language:
4
+ - en
5
+ ---
6
+ GPT-J-Pyg_PPO-6B [GPT-J Pygmalion Dev V8p4 + GPT-J PPO_HH]
7
+
8
+ GPT-J-Pyg_PPO-6B is an experimental model containing a parameter-wise 50/50 blend (weighted average) of the weights of ppo_hh_gpt-j and Pygmalion-6b Dev V8p4.
9
+
10
+ -Intended Merge Value-
11
+
12
+ As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs.
13
+ Pyg_PPO combines ppo_hh_gpt-j and Pygmalion-6b; both technical
14
+ achievements are blended with the intent to elevate the strengths of
15
+ both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have
16
+ the largest impact on the usefulness of a model without the expense of
17
+ fine-tuning. Blend was done in FP32 and output in FP16.
18
+
19
+ -Intended Use-
20
+
21
+ Research purposes only, intended for responsible use.
22
+ Express a conversation in natural language, and Pyg_PPO will do the thing.
23
+ Try starting a two line prompt such as:
24
+ ```
25
+ Bot: "Hello, how are you?"
26
+ You: "I am doing just fine, thank you."
27
+ ```
28
+ Or any other
29
+ topic, and the model will carry on in this back and forth format.
30
+
31
+ Can also be used as a base to merge with other creative,
32
+ technical, or adventure themed models of the same class
33
+ (GPT-J & 6b NeoX) and parameter size (6b) to experiment with
34
+ the morphology of model weights based on the value added
35
+ by instruct.
36
+
37
+ Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.9, Temperature at 0.6, and Repetition Penalty at 1.1; extra samplers
38
+ disabled.
39
+
40
+ -Credits To-
41
+
42
+ Core Model:
43
+ https://huggingface.co/EleutherAI/gpt-j-6B
44
+ Author:
45
+ https://www.eleuther.ai/
46
+
47
+ Model1; 50% ppo_hh_gpt-j:
48
+ https://huggingface.co/reciprocate/ppo_hh_gpt-j
49
+
50
+ Author Repo:
51
+ https://huggingface.co/reciprocate
52
+
53
+ Related; CarperAI:
54
+ https://huggingface.co/CarperAI
55
+
56
+ Dataset is a variant of the Helpful Harmless assistant themed
57
+ dataset and Proximal Policy Optimization, specific datasets
58
+ used are unknown; listed repo datasets include:
59
+ https://huggingface.co/datasets/reciprocate/summarize_eval_ilql
60
+ https://huggingface.co/datasets/reciprocate/hh_eval_ilql
61
+
62
+ PPO explained:
63
+ https://paperswithcode.com/method/ppo
64
+ Potential HH-type datasets utilized:
65
+ https://huggingface.co/HuggingFaceH4
66
+ https://huggingface.co/datasets/Anthropic/hh-rlhf
67
+
68
+ Model2; 50% Pygmalion-6b:
69
+ https://huggingface.co/PygmalionAI/pygmalion-6b
70
+
71
+ Author Repo:
72
+ https://huggingface.co/PygmalionAI
73
+
74
+ Weight merge Script credit to Concedo:
75
+ https://huggingface.co/concedo
76
+
77
+ Model's card template credit to Digitous:
78
+ https://huggingface.co/digitous/GPT-R