TehVenom commited on
Commit
f30709d
1 Parent(s): 69ea223

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -5
README.md CHANGED
@@ -1,10 +1,93 @@
1
- #TODO card. Mix of (GPT-J-6B-Janeway + PPO_HH_GPT-J) + Pygmalion-6b-DEV (V8 / Part 4)
 
 
 
 
 
 
 
 
2
 
3
- At a ratio of
 
 
 
 
4
 
 
5
 
6
- GPT-J-6B-Janeway - 20%
7
 
8
- PPO_HH_GPT-J - 20%
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
- Pygmalion-6b DEV (V8 / Part 4) - 60%
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ commercial: 'no'
5
+ inference: false
6
+ ---
7
+ # GPT-J 6B - PPO_Pygway Mix
8
+ ## Model description
9
+ This is a merged model, using a weighted parameter blend strategy at a (20:20:60) ratio between the models:
10
 
11
+ - [20%] - KoboldAI/GPT-J-6B-Janeway: https://huggingface.co/KoboldAI/GPT-J-6B-Janeway
12
+ - [20%] - reciprocate/ppo_hh_gpt-j: https://huggingface.co/reciprocate/ppo_hh_gpt-j
13
+ - [60%] - Pygmalion/Pygmalion-6b DEV (V8 / Part 4): https://huggingface.co/Pygmalion/Pygmalion-6b
14
+
15
+ By their respective authors.
16
 
17
+ **Warning: PPO_Pygway-V8p4_Dev-6b may generate NSFW or inappropriate content due to the base models (Mainly [Pygmalion/Pygmalion-6b V8P4](https://huggingface.co/Pygmalion/Pygmalion-6b)) being trained on general user logs, and internet archives.**
18
 
19
+ ### Intended Use:
20
 
21
+ Research purposes only, intended for responsible use.
22
+ Express a conversation in natural language, and PPO_Pygmalion will pick up on the conversational format.
23
+ Try starting a two line prompt such as:
24
+ ```
25
+ Bot: "Hello, how are you?"
26
+ You: "I am doing just fine, thank you."
27
+ ```
28
+ Or any other topic, and the model will carry on in this back and forth style.
29
+
30
+ ## Information:
31
+ For more details, check out the related source models, especially [Pygmalion/Pygmalion-6b V8P4](https://huggingface.co/Pygmalion/Pygmalion-6b) for more information on how to utilize the chat bot formatting expected.
32
+
33
+ In a similar manner to fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs.
34
+ PPO_Pygway combines `ppo_hh_gpt-j`, `Janeway-6b` and `Pygmalion-6b V8P4`; all three models were blended in a two step process using a simple weighted parameter method
35
+ ```
36
+ (X*A + Y*B)
37
+ ```
38
+ With X & Y being the model weighs, and A/B being how strongly they are represented within the final value.
39
+ The intent of this is to elevate the end-model by borrowing the strongly represented aspects out of each base model,
40
+ but may also weaken other faces of each model, which can be desirable if the base models have problematic traits that need to be worked on.
41
+
42
+ Blend was done in FP32 and output saved in FP16 for reduced storage needs.
43
+
44
+
45
+ ## Limitations and biases
46
+ Based on known problems with NLP technology, potential relevant factors include bias (gender, profession, race and religion).
47
+
48
+ <ins>Warning: This model has a moderate NSFW bias.</ins>
49
+
50
+ ### License
51
+ GPT-J-6b is licensed by EleutherAI under the apache-2.0 license. All Rights Reserved.
52
+
53
+ ### BibTeX entry and citation info
54
+ ```
55
+ @misc{gpt-j,
56
+ author = {Wang, Ben and Komatsuzaki, Aran},
57
+ title = {{GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model}},
58
+ howpublished = {\url{https://github.com/kingoflolz/mesh-transformer-jax}},
59
+ year = 2021,
60
+ month = May
61
+ }
62
+ ```
63
+
64
+ ### Credits To:
65
+
66
+ Models involved:
67
+ - https://huggingface.co/EleutherAI/gpt-j-6B
68
+ - https://huggingface.co/Pygmalion/Pygmalion-6b
69
+ - https://huggingface.co/reciprocate/ppo_hh_gpt-j
70
+ - https://huggingface.co/KoboldAI/GPT-J-6B-Janeway
71
+
72
+ Average weights merging Script credit to Concedo:
73
+ - https://huggingface.co/concedo
74
+
75
+ ### Related datasets and articles:
76
+
77
+ PPO_HH-GPT-J-6b's Dataset is a variant of the Helpful Harmless assistant themed
78
+ dataset and Proximal Policy Optimization, specific datasets
79
+ used are unknown; listed repo datasets include:
80
+ - https://huggingface.co/datasets/reciprocate/summarize_eval_ilql
81
+ - https://huggingface.co/datasets/reciprocate/hh_eval_ilql
82
+
83
+ PPO explained:
84
+ - https://paperswithcode.com/method/ppo
85
+
86
+ Potential HH-type datasets utilized:
87
+ - https://huggingface.co/HuggingFaceH4
88
+ - https://huggingface.co/datasets/Anthropic/hh-rlhf
89
+
90
+ No formal evaluation is available for this model at this time.
91
+
92
+ It is recommend to use this model with the KoboldAI software. All feedback and comments can be directed to TeH_Venom on the KoboldAI discord.
93