File size: 3,892 Bytes
0bb41ae 3f4327a f212b69 3f4327a 0bb41ae 3f4327a 0bb41ae 3f4327a 50679a9 3f4327a 50679a9 3f4327a 50679a9 3f4327a 74eaa58 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
language:
- en
thumbnail: https://cdn.discordapp.com/attachments/886209362572476486/1067698349681164308/thumb-overlay.png
tags:
- pytorch
- causal-lm
license: agpl-3.0
pipeline_tag: text-generation
---
# 🪷 Lotus-12B
Lotus-12B is a GPT-NeoX 12B model fine-tuned on 2.5GB of a diverse range of light novels, erotica, annotated literature, and public-domain conversations for the purpose of generating novel-like fictional text and conversations.
## Model Description
The model used for fine-tuning is [Pythia 12B Deduped](https://github.com/EleutherAI/pythia), which is a 12 billion parameter auto-regressive language model trained on [The Pile](https://pile.eleuther.ai/).
## Training Data & Annotative Prompting
The data used in fine-tuning has been gathered from various sources such as the [Gutenberg Project](https://www.gutenberg.org/). The annotated fiction dataset has prepended tags to assist in generating towards a particular style. Here is an example prompt that shows how to use the annotations.
```
[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror; Tags: 3rdperson, scary; Style: Dark ]
***
When a traveler in north central Massachusetts takes the wrong fork...
```
And for conversations which were scraped from [My Discord Server](https://discord.com/invite/touhouai) and publicly available subreddits from [Reddit](https://www.reddit.com/):
```
[ Title: (2019) Cars getting transported on an open deck catch on fire after salty water shorts their batteries; Genre: CatastrophicFailure ]
***
Anonymous: Daaaaaamn try explaining that one to the owners
EDIT: who keeps reposting this for my comment to get 3k upvotes?
Anonymous: "Your car caught fire from some water"
Irythros: Lol, I wonder if any compensation was in order
Anonymous: Almost all of the carriers offer insurance but it isn’t cheap. I guarantee most of those owners declined the insurance.
```
The annotations can be mixed and matched to help generate towards a specific style.
## Downstream Uses
This model can be used for entertainment purposes and as a creative writing assistant for fiction writers and chatbots.
## Example Code
```
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('hakurei/lotus-12B')
tokenizer = AutoTokenizer.from_pretrained('hakurei/lotus-12B')
prompt = '''[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler'''
input_ids = tokenizer.encode(prompt, return_tensors='pt')
output = model.generate(input_ids, do_sample=True, temperature=1.0, top_p=0.9, repetition_penalty=1.2, max_length=len(input_ids[0])+100, pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.decode(output[0])
print(generated_text)
```
An example output from this code produces a result that will look similar to:
```
[ Title: The Dunwich Horror; Author: H. P. Lovecraft; Genre: Horror ]
***
When a traveler comes to an unknown region, his thoughts turn inevitably towards the old gods and legends which cluster around its appearance. It is not that he believes in them or suspects their reality—but merely because they are present somewhere else in creation just as truly as himself, and so belong of necessity in any landscape whose features cannot be altogether strange to him. Moreover, man has been prone from ancient times to brood over those things most connected with the places where he dwells. Thus the Olympian deities who ruled Hyper
```
## Team members and Acknowledgements
This project would not have been possible without the work done by EleutherAI. Thank you!
- [Anthony Mercurio](https://github.com/harubaru)
- Imperishable_NEET
In order to reach us, you can join our [Discord server](https://discord.gg/touhouai).
[![Discord Server](https://discordapp.com/api/guilds/930499730843250783/widget.png?style=banner2)](https://discord.gg/touhouai) |