grimulkan
/

story-reverse-prompt-70b-rope8-32K-fp16

@@ -2,19 +2,30 @@
 license: llama2
 ---
-This model is the secret weapon behind the [Aurelian](https://huggingface.co/grimulkan/aurelian-alpha0.1-70b-rope8-32K-fp16) series of models. It takes a chunk of story as the input, and produces the prompt that would have produced it. Basically, it converts a story into Q&A format, for the purpose of training another model for instruct-tuned story writing. I made it for internal use, and it is not very user-friendly. But it's what I've got until I get the compute to re-train it.
 A 6-bit EXL2 quantization is available [here](https://huggingface.co/grimulkan/story-reverse-prompt-70b-rope8-32K-6bpw_h6_exl2).
 The steps to use this model are:
- - Get a plaintext version of a story (hopefully human-written, that would be the main point of using a model like this).
- - Divide the story into chunks, typically less than 3000 tokens in length. Try to break on chapter boundaries, and strip any non-story text, like the chapter number, page number, etc.
- - Setup the initial prompt of the model (see below), and pass it the first chunk. Model produces a prompt that could generate it (see example).
- - Concatenate the previous output cumulatively (context history), and pass the combined input along with the next story chunk, similar to a normal chat conversation. Model produces the writing prompt corresponding to the 2nd chunk.
-   - The reason the model accepts a conversation history and has 32K of context so to keep the writing prompts sounding natural and pay attention to prior story context.
-   - For instance, let's say the character meets an elf in the new chunk, but the elf was already introduced some chunks before in the story. When writing the prompt, the model would correctly refer to 'the elf', rather than 'an elf', since it knows the prior story context. This is the main advantage of using this model vs trying to generate a standalone summary per story chunk. Standalone summaries end up sounding very inconsistent over a long story.
-   - Depending on how much VRAM you have, you may need to limit your context length (or truncate at 32K anyway, leaving room for the output), just like any other chat conversation. Most clients will do this automatically (egs., oobabooga). You will lose prior story context, but 32K is pretty big if you can use it.
-- Continue this process, until your entire story text is converted into a series of writing prompts and corresponding story chunks as output, that you can convert to egs., fastchat format for SFT.
 ## Input format:
@@ -41,7 +52,7 @@ TASK: Write a detailed prompt that would generate the above section of the story
 <s>ASSISTANT:
 ```
- - The model will respond with the output.
  - Note that there is a blank space at the end of the above completion request (after `ASSISTANT:`).
  - I used `<NO-LINE-BREAK>` to indicate that there is not supposed to be a line break there, but I added a line break in the above text, just for human readability.
  - The first prompt (with the YES response) is basically a hard-coded way to add some background info, outside the system prompt. It is optional, but the model was trained to expect it (even if it only provides trivial information like 'This is a fictional story').
@@ -71,11 +82,11 @@ TASK: Write a detailed prompt that would generate the above section of the story
 TASK: Write a detailed prompt that would generate the above section of the story.</s><NO-LINE-BREAK>
-<s>ASSISTANT:
 ```
 and so on.
-If you run out of context length, you'd need to drop the oldest chunks/prompts (most clients will do this automatically). It is optional whether you want to drop the first general information/background prompt or not.
 ## Example:
@@ -147,7 +158,7 @@ Amid this backdrop of half-imagined dystopia and unlikely vibrancy, crime ripple
 TASK: Write a detailed prompt that would generate the above section of the story.</s><NO-LINE-BREAK>
-<s>ASSISTANT:
 ```
 **Model output (2nd chunk):**
@@ -157,6 +168,6 @@ Describe some of the horrid and grotesque creatures that stalk the shadows of th
 Also mention the crime that is rampant in the city. Describe some of the types of illicit activities that are common in the city.</s>
-and so on. As you can see, you can get varied responses, sometimes specific and detailed (like the 1st chunk output), sometimes short (allowing the response to be more open-ended). This was done to mimic the different ways humans might ask to write a story section. Sometimes they may want to tell the model exactly what to write, sometimes they want the model to choose.
 Right now, there is no way to control which kind of output you get from this model, but you can regenerate until you get the desired length/level of detail if you like.

 license: llama2
 ---
+This model is the secret weapon behind the [Aurelian](https://huggingface.co/grimulkan/aurelian-alpha0.1-70b-rope8-32K-fp16) series of models. It takes a chunk of story as the input, and generates the prompt that could have produced it. Basically, it converts a story into Q&A format, for the purpose of training another model for instruct-tuned story writing.
+I made it for internal use, and it is not very user-friendly. But it's what I've got until I get the compute to re-train it.
 A 6-bit EXL2 quantization is available [here](https://huggingface.co/grimulkan/story-reverse-prompt-70b-rope8-32K-6bpw_h6_exl2).
 The steps to use this model are:
+ - Get a plaintext version of a story.
+   - Hopefully human-written, that would be the main point of using a model like this.
+ - Divide the story into chunks.
+   - Typically less than 3000 tokens per chunk.
+   - Try to break on chapter boundaries.
+   - Try to strip any non-story text, like the chapter number, page number, etc.
+ - Setup the initial prompt of the model (see below), and pass it the first chunk.
+   - Model produces a prompt that could generate it (see example).
+ - Concatenate the previous output cumulatively (context history, see examples below), and pass the combined input along with the next story chunk, similar to a normal chat conversation.
+   - Model produces the writing prompt corresponding to the 2nd chunk.
+   - The reason the model accepts a conversation history and has 32K of context is to keep the writing prompts sounding natural and to pay attention to prior story context.
+   - For instance, let's say the character meets an elf in the new chunk, but the elf was already introduced some chunks before in the story. When writing the prompt, the model would correctly refer to 'the elf', rather than 'an elf', since it knows the prior story context.
+     - This is the main advantage of using this model vs trying to generate a standalone summary per story chunk. Standalone summaries end up sounding very inconsistent over a long story.
+   - Depending on how much VRAM you have, you may need to limit your context length (or truncate at 32K anyway, leaving room for the output), just like any other chat conversation. Most clients will do this automatically (egs., oobabooga). You will lose prior story context, but 32K is pretty big if you can use all of it.
+- Continue this process, until your entire story text is converted into a series of writing prompts and corresponding story chunks as output.
+  - Then you can convert the Q&A to egs., fastchat format in a .json for SFT.
 ## Input format:
 <s>ASSISTANT:
 ```
+ - The model will respond with the output (see an example below).
  - Note that there is a blank space at the end of the above completion request (after `ASSISTANT:`).
  - I used `<NO-LINE-BREAK>` to indicate that there is not supposed to be a line break there, but I added a line break in the above text, just for human readability.
  - The first prompt (with the YES response) is basically a hard-coded way to add some background info, outside the system prompt. It is optional, but the model was trained to expect it (even if it only provides trivial information like 'This is a fictional story').
 TASK: Write a detailed prompt that would generate the above section of the story.</s><NO-LINE-BREAK>
+<s>ASSISTANT:
 ```
 and so on.
+If you run out of context length, you'd need to drop the oldest chunks/prompts (most clients will do this automatically). It is optional whether you want to drop or preserve the first general information/background prompt or not.
 ## Example:
 TASK: Write a detailed prompt that would generate the above section of the story.</s><NO-LINE-BREAK>
+<s>ASSISTANT:
 ```
 **Model output (2nd chunk):**
 Also mention the crime that is rampant in the city. Describe some of the types of illicit activities that are common in the city.</s>
+and so on. As you can see, you can get varied responses, sometimes specific and detailed (like the 1st chunk output), or short (leaving the story details to be more open-ended). This was done to mimic the different ways humans might ask to write a story section. Sometimes they may want to tell the model exactly what to write, sometimes they want the model to choose.
 Right now, there is no way to control which kind of output you get from this model, but you can regenerate until you get the desired length/level of detail if you like.