Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,12 @@ datasets:
|
|
7 |
|
8 |
# mpt-7b-storysummarizer
|
9 |
|
10 |
-
This is a fine-tuned version of [mosaicml/mpt-7b-storywriter](https://huggingface.co/mosaicml/mpt-7b-storywriter)
|
11 |
-
The training run was performed using [llm-foundry](https://github.com/mosaicml/llm-foundry) on an 8xA100 80 GB node at 8192 context length using [this configuration](https://gist.github.com/jquesnelle/f9fb28b8102cba8e79a6c08f132fbf49). The run can be viewed on [wandb](https://wandb.ai/emozilla/booksum/runs/457ym4r9).
|
12 |
|
13 |
-
|
|
|
14 |
|
15 |
-
|
16 |
|
17 |
```
|
18 |
SOME_FICTION
|
@@ -28,29 +28,6 @@ SOME_FICTION
|
|
28 |
### ANALYSIS:
|
29 |
```
|
30 |
|
31 |
-
A `repetition_penalty` of ~1.04 seems to be best. For summary prompts, simple greedy search suffices while a temperature of 0.8 works well for analysis.
|
32 |
-
The model often prints `'#'` to delinate the end of a a summary or analyis. You can use `transformers.StopOnTokens` to end a generation.
|
33 |
-
|
34 |
-
```python
|
35 |
-
class StopOnTokens(StoppingCriteria):
|
36 |
-
def __init__(self, stop_ids):
|
37 |
-
self.stop_ids = stop_ids
|
38 |
-
|
39 |
-
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
|
40 |
-
for stop_id in self.stop_ids:
|
41 |
-
if input_ids[0][-1] == stop_id:
|
42 |
-
return True
|
43 |
-
return False
|
44 |
-
|
45 |
-
stop_ids = tokenizer("#").input_ids
|
46 |
-
stopping_criteria = StoppingCriteriaList([StopOnTokens(stop_ids)]),
|
47 |
-
```
|
48 |
-
|
49 |
-
Pass `stopping_criteria` as an argument to the model's `generate` function to stop on `#`.
|
50 |
-
|
51 |
-
The code for this model includes adaptions from [Birchlabs/mosaicml-mpt-7b-chat-qlora](https://huggingface.co/Birchlabs/mosaicml-mpt-7b-chat-qlora) which allow MPT models to be loaded with `device_map="auto"` and `load_in_8bit=True`.
|
52 |
-
For longer contexts, the following is recommended:
|
53 |
-
|
54 |
```python
|
55 |
tokenizer = AutoTokenizer.from_pretrained("emozilla/mpt-7b-storysummarizer")
|
56 |
model = AutoModelForCausalLM.from_pretrained(
|
@@ -58,4 +35,25 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
58 |
load_in_8bit=True,
|
59 |
trust_remote_code=True,
|
60 |
device_map="auto")
|
61 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
# mpt-7b-storysummarizer
|
9 |
|
10 |
+
This is a fine-tuned version of [mosaicml/mpt-7b-storywriter](https://huggingface.co/mosaicml/mpt-7b-storywriter) intended for summarization and literary analysis of fiction stories.
|
|
|
11 |
|
12 |
+
The code for this model includes the adaptions from [Birchlabs/mosaicml-mpt-7b-chat-qlora](https://huggingface.co/Birchlabs/mosaicml-mpt-7b-chat-qlora) which allow MPT models to be loaded with `device_map="auto"` and `load_in_8bit=True`.
|
13 |
+
It also has the [latest key-value cache MPT code](https://github.com/mosaicml/llm-foundry/pull/210) to allow for fast inference with `transformers` (thus, `use_cache` is set to `True` in `config.json`).
|
14 |
|
15 |
+
## How to Use
|
16 |
|
17 |
```
|
18 |
SOME_FICTION
|
|
|
28 |
### ANALYSIS:
|
29 |
```
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
```python
|
32 |
tokenizer = AutoTokenizer.from_pretrained("emozilla/mpt-7b-storysummarizer")
|
33 |
model = AutoModelForCausalLM.from_pretrained(
|
|
|
35 |
load_in_8bit=True,
|
36 |
trust_remote_code=True,
|
37 |
device_map="auto")
|
38 |
+
```
|
39 |
+
|
40 |
+
## Example outputs
|
41 |
+
|
42 |
+
Outputs on the text of [_Waystation City_](https://www.uncannymagazine.com/article/waystation-city/) (6,287 tokens)
|
43 |
+
|
44 |
+
### Summary
|
45 |
+
|
46 |
+
`temperature=0.6`, `repetition_penalty=1.04`, `top_p=0.95`, `top_k=50`, `do_sample=True`, `max_new_tokens=1024`
|
47 |
+
|
48 |
+
> The Disappeared Twins asks Gerty to record their disappearance and she agrees on the condition that they tell her their story. After hearing their story, Gerty tells them that she will give them passage to the lower city and they accept. They board a late-night tram and Micheala joins them. At the end of the line, they bid goodbye to Gerty and Micheala advises them to wait patiently for their turn to disappear. The twins decide to wait and depart. As they descend the stairs leading into the lower city, they pass a sign warning them of the dangers awaiting them and arrive at the dock. Here, the twins paint a message on the wall reading "We Want To Stay What We Are" and then approach the ferry drivers. The ferry drivers notice them and prepare to attack, but Gerty warns them and they flee. The twins decide to follow the advice of the sign and wait. They watch the boats appear and disappear in the holes in the fabric of reality and eventually one appears that seems to be from the 1970's. The twins run forward and jump into the hole. However, as they leap into the hole, the boat disappears and the hole snaps shut. Claude calls out to Gerty to write down their story and then jumps into the hole. He lands in a field of wheat and waits for Gerty to catch up with him. When she arrives, he instructs her to write down the story of the Disappeared Twins. Claude then dissolves into the hole and Gerty watches as the field turns to sand, then rock, and then water. She sees him shaking his head and then he disappears.
|
49 |
+
|
50 |
+
### Analysis
|
51 |
+
|
52 |
+
`temperature=0.75`, `repetition_penalty=1.04`, `top_p=0.95`, `top_k=50`, `do_sample=True`, `max_new_tokens=1024`
|
53 |
+
|
54 |
+
> This chapter details the twins' journey to the Lower City, where the reader learns that it is not safe for anyone who wishes to leave to go. By describing the twins' experiences and thoughts, the reader gets a sense of what it feels like to be trapped in a seemingly permanent limbo, unable to move forward or backward in time. In addition, the readers realizes that despite the danger of staying, the twins cannot move on, nor do they seem interested in leaving. The twins seek out Gretel's expertise because they believe that she can help them get home and avoid dissolving entirely. However, Gretel explains that she can only help if they tell her their entire stories, including their pasts--something the twins have thus far avoided doing. Furthermore, they never speak of why they wish to return to the time in which they lived. The fact that they never explain their desire to leave suggests that this limbo is a sort of purgatory, meant to test the twins' resolve to stay or go. Thus, the twins may realize that they do not wish to leave Waystation City, but they have not quite figured out how to fully settle in.
|
55 |
+
|
56 |
+
## Training
|
57 |
+
|
58 |
+
The model was trained on [emozilla/booksum-summary-analysis_gptneox-8192](https://huggingface.co/datasets/emozilla/booksum-summary-analysis_gptneox-8192), which is adapted from [kmfoda/booksum](https://huggingface.co/datasets/kmfoda/booksum).
|
59 |
+
The training run was performed using [llm-foundry](https://github.com/mosaicml/llm-foundry) on an 8xA100 80 GB node at 8,192 token sequence length using [this configuration](https://gist.github.com/jquesnelle/f9fb28b8102cba8e79a6c08f132fbf49). The run can be viewed on [wandb](https://wandb.ai/emozilla/booksum/runs/457ym4r9).
|