Text Generation
Transformers
PyTorch
Safetensors
English
llama
text generation
instruct
text-generation-inference
FunkEngine commited on
Commit
234145d
1 Parent(s): fb9e575

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +66 -0
  2. config.json +26 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ thumbnail: null
5
+ tags:
6
+ - text generation
7
+ - instruct
8
+ pipeline_tag: text-generation
9
+ inference: false
10
+ license: llama2
11
+ datasets:
12
+ - SchweinZwei/PIPPA
13
+ - Open-Orca/OpenOrca
14
+ - Norquinal/claude_multiround_chat_30k
15
+ - jondurbin/airoboros-gpt4-1.4.1
16
+ - databricks/databricks-dolly-15k
17
+ ---
18
+ <h1 style="text-align: center">SchweinZwei/SchweinZwei-13b</h1>
19
+ <h2 style="text-align: center">An instruction-tuned Llama-2 biased towards fiction writing and conversation.</h2>
20
+
21
+ ## Model Details
22
+
23
+ The long-awaited release of our new models based on Llama-2 is finally here. SchweinZwei-13b (formerly known as Metharme) is based on
24
+ [Llama-2 13B](https://huggingface.co/meta-llama/llama-2-13b-hf) released by Meta AI.
25
+
26
+ The Metharme models were an experiment to try and get a model that is usable for conversation, roleplaying and storywriting,
27
+ but which can be guided using natural language like other instruct models. After much deliberation, we reached the conclusion
28
+ that the Metharme prompting format is superior (and easier to use) compared to the classic Schweinen.
29
+
30
+ This model was trained by doing supervised fine-tuning over a mixture of regular instruction data alongside roleplay, fictional stories
31
+ and conversations with synthetically generated instructions attached.
32
+
33
+ This model is freely available for both commercial and non-commercial use, as per the Llama-2 license.
34
+
35
+
36
+ ## Prompting
37
+
38
+ The model has been trained on prompts using three different roles, which are denoted by the following tokens: `<|system|>`, `<|user|>` and `<|model|>`.
39
+
40
+ The `<|system|>` prompt can be used to inject out-of-channel information behind the scenes, while the `<|user|>` prompt should be used to indicate user input.
41
+ The `<|model|>` token should then be used to indicate that the model should generate a response. These tokens can happen multiple times and be chained up to
42
+ form a conversation history.
43
+
44
+ ### Prompting example
45
+
46
+ The system prompt has been designed to allow the model to "enter" various modes and dictate the reply length. Here's an example:
47
+
48
+ ```
49
+ <|system|>Enter RP mode. Pretend to be {{char}} whose persona follows:
50
+ {{persona}}
51
+
52
+ You shall reply to the user while staying in character, and generate long responses.
53
+ ```
54
+
55
+ ## Dataset
56
+ The dataset used to fine-tune this model includes our own [PIPPA], along with several other instruction
57
+ datasets, and datasets acquired from various RP forums.
58
+
59
+ ## Limitations and biases
60
+
61
+ The intended use-case for this model is fictional writing for entertainment purposes. Any other sort of usage is out of scope.
62
+
63
+ As such, it was **not** fine-tuned to be safe and harmless: the base model _and_ this fine-tune have been trained on data known to contain profanity and texts that are lewd or otherwise offensive. It may produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. Outputs might often be factually wrong or misleading.
64
+
65
+ ## Acknowledgements
66
+ We would like to thank [SpicyChat](https://spicychat.ai/) for sponsoring the training for this model.
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "SchweinZwei/SchweinZwei-13b",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "badwordsids": "[[29961], [14352], [24630], [29962], [11759], [15974], [5519], [25473], [18899], [25901], [7110], [9341], [13531], [518], [9310], [2636], [3366], [21069], [11970], [23098], [16733], [21298], [18173], [10846], [3816], [28513], [15625], [23192], [28166], [10062], [1385], [11724], [3108], [15555], [10834], [10370], [14330], [1822], [12436], [5262], [17094], [10725], [17077], [11424], [4197], [24406], [13359], [17531], [24566], [23076], [4514], [13192], [19942], [16261], [7072], [6024], [1402], [1839], [2033], [13970], [850], [5913], [28895], [5387], [8308], [24927], [5691], [12940], [19997], [18959], [11287], [16862], [4638], [22322], [29861], [21251], [14704], [17548], [12452], [17288], [23160], [24960], [8219], [18024], [5539], [7464], [27865], [29588], [20068], [19660], [27706], [22896], [24264], [12258], [2314], [4400], [5586], [12622], [6796], [7226], [21939], [18456], [14178], [21540], [21945], [14664], [16215], [10338], [17361], [7503], [13769], [26073], [9601], [26909], [7961], [8999], [20840], [16272], [21545], [3199], [10514], [5159], [22689], [6525], [20526], [27077], [18017]]",
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 5120,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 13824,
13
+ "max_position_embeddings": 4096,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 40,
16
+ "num_hidden_layers": 40,
17
+ "num_key_value_heads": 40,
18
+ "pretraining_tp": 1,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_scaling": null,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "bfloat16",
23
+ "transformers_version": "4.33.0.dev0",
24
+ "use_cache": true,
25
+ "vocab_size": 32000
26
+ }