emozilla commited on
Commit
d4a2524
1 Parent(s): f282d2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -28
README.md CHANGED
@@ -7,12 +7,12 @@ datasets:
7
 
8
  # mpt-7b-storysummarizer
9
 
10
- This is a fine-tuned version of [mosaicml/mpt-7b-storywriter](https://huggingface.co/mosaicml/mpt-7b-storywriter) on [emozilla/booksum-summary-analysis_gptneox-8192](https://huggingface.co/datasets/emozilla/booksum-summary-analysis_gptneox-8192), which is adapted from [kmfoda/booksum](https://huggingface.co/datasets/kmfoda/booksum).
11
- The training run was performed using [llm-foundry](https://github.com/mosaicml/llm-foundry) on an 8xA100 80 GB node at 8192 context length using [this configuration](https://gist.github.com/jquesnelle/f9fb28b8102cba8e79a6c08f132fbf49). The run can be viewed on [wandb](https://wandb.ai/emozilla/booksum/runs/457ym4r9).
12
 
13
- ## How to Use
 
14
 
15
- This model is intended for summarization and literary analysis of fiction stories. It can be prompted in one of two ways:
16
 
17
  ```
18
  SOME_FICTION
@@ -28,29 +28,6 @@ SOME_FICTION
28
  ### ANALYSIS:
29
  ```
30
 
31
- A `repetition_penalty` of ~1.04 seems to be best. For summary prompts, simple greedy search suffices while a temperature of 0.8 works well for analysis.
32
- The model often prints `'#'` to delinate the end of a a summary or analyis. You can use `transformers.StopOnTokens` to end a generation.
33
-
34
- ```python
35
- class StopOnTokens(StoppingCriteria):
36
- def __init__(self, stop_ids):
37
- self.stop_ids = stop_ids
38
-
39
- def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
40
- for stop_id in self.stop_ids:
41
- if input_ids[0][-1] == stop_id:
42
- return True
43
- return False
44
-
45
- stop_ids = tokenizer("#").input_ids
46
- stopping_criteria = StoppingCriteriaList([StopOnTokens(stop_ids)]),
47
- ```
48
-
49
- Pass `stopping_criteria` as an argument to the model's `generate` function to stop on `#`.
50
-
51
- The code for this model includes adaptions from [Birchlabs/mosaicml-mpt-7b-chat-qlora](https://huggingface.co/Birchlabs/mosaicml-mpt-7b-chat-qlora) which allow MPT models to be loaded with `device_map="auto"` and `load_in_8bit=True`.
52
- For longer contexts, the following is recommended:
53
-
54
  ```python
55
  tokenizer = AutoTokenizer.from_pretrained("emozilla/mpt-7b-storysummarizer")
56
  model = AutoModelForCausalLM.from_pretrained(
@@ -58,4 +35,25 @@ model = AutoModelForCausalLM.from_pretrained(
58
  load_in_8bit=True,
59
  trust_remote_code=True,
60
  device_map="auto")
61
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  # mpt-7b-storysummarizer
9
 
10
+ This is a fine-tuned version of [mosaicml/mpt-7b-storywriter](https://huggingface.co/mosaicml/mpt-7b-storywriter) intended for summarization and literary analysis of fiction stories.
 
11
 
12
+ The code for this model includes the adaptions from [Birchlabs/mosaicml-mpt-7b-chat-qlora](https://huggingface.co/Birchlabs/mosaicml-mpt-7b-chat-qlora) which allow MPT models to be loaded with `device_map="auto"` and `load_in_8bit=True`.
13
+ It also has the [latest key-value cache MPT code](https://github.com/mosaicml/llm-foundry/pull/210) to allow for fast inference with `transformers` (thus, `use_cache` is set to `True` in `config.json`).
14
 
15
+ ## How to Use
16
 
17
  ```
18
  SOME_FICTION
 
28
  ### ANALYSIS:
29
  ```
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ```python
32
  tokenizer = AutoTokenizer.from_pretrained("emozilla/mpt-7b-storysummarizer")
33
  model = AutoModelForCausalLM.from_pretrained(
 
35
  load_in_8bit=True,
36
  trust_remote_code=True,
37
  device_map="auto")
38
+ ```
39
+
40
+ ## Example outputs
41
+
42
+ Outputs on the text of [_Waystation City_](https://www.uncannymagazine.com/article/waystation-city/) (6,287 tokens)
43
+
44
+ ### Summary
45
+
46
+ `temperature=0.6`, `repetition_penalty=1.04`, `top_p=0.95`, `top_k=50`, `do_sample=True`, `max_new_tokens=1024`
47
+
48
+ > The Disappeared Twins asks Gerty to record their disappearance and she agrees on the condition that they tell her their story. After hearing their story, Gerty tells them that she will give them passage to the lower city and they accept. They board a late-night tram and Micheala joins them. At the end of the line, they bid goodbye to Gerty and Micheala advises them to wait patiently for their turn to disappear. The twins decide to wait and depart. As they descend the stairs leading into the lower city, they pass a sign warning them of the dangers awaiting them and arrive at the dock. Here, the twins paint a message on the wall reading "We Want To Stay What We Are" and then approach the ferry drivers. The ferry drivers notice them and prepare to attack, but Gerty warns them and they flee. The twins decide to follow the advice of the sign and wait. They watch the boats appear and disappear in the holes in the fabric of reality and eventually one appears that seems to be from the 1970's. The twins run forward and jump into the hole. However, as they leap into the hole, the boat disappears and the hole snaps shut. Claude calls out to Gerty to write down their story and then jumps into the hole. He lands in a field of wheat and waits for Gerty to catch up with him. When she arrives, he instructs her to write down the story of the Disappeared Twins. Claude then dissolves into the hole and Gerty watches as the field turns to sand, then rock, and then water. She sees him shaking his head and then he disappears.
49
+
50
+ ### Analysis
51
+
52
+ `temperature=0.75`, `repetition_penalty=1.04`, `top_p=0.95`, `top_k=50`, `do_sample=True`, `max_new_tokens=1024`
53
+
54
+ > This chapter details the twins' journey to the Lower City, where the reader learns that it is not safe for anyone who wishes to leave to go. By describing the twins' experiences and thoughts, the reader gets a sense of what it feels like to be trapped in a seemingly permanent limbo, unable to move forward or backward in time. In addition, the readers realizes that despite the danger of staying, the twins cannot move on, nor do they seem interested in leaving. The twins seek out Gretel's expertise because they believe that she can help them get home and avoid dissolving entirely. However, Gretel explains that she can only help if they tell her their entire stories, including their pasts--something the twins have thus far avoided doing. Furthermore, they never speak of why they wish to return to the time in which they lived. The fact that they never explain their desire to leave suggests that this limbo is a sort of purgatory, meant to test the twins' resolve to stay or go. Thus, the twins may realize that they do not wish to leave Waystation City, but they have not quite figured out how to fully settle in.
55
+
56
+ ## Training
57
+
58
+ The model was trained on [emozilla/booksum-summary-analysis_gptneox-8192](https://huggingface.co/datasets/emozilla/booksum-summary-analysis_gptneox-8192), which is adapted from [kmfoda/booksum](https://huggingface.co/datasets/kmfoda/booksum).
59
+ The training run was performed using [llm-foundry](https://github.com/mosaicml/llm-foundry) on an 8xA100 80 GB node at 8,192 token sequence length using [this configuration](https://gist.github.com/jquesnelle/f9fb28b8102cba8e79a6c08f132fbf49). The run can be viewed on [wandb](https://wandb.ai/emozilla/booksum/runs/457ym4r9).