Samantha Mistral Instruct 7b - Comprehensive Bulleted Notes

Community Article Published March 28, 2024

image/png

Background

I entered into the ecosystem surrounding Large Language Models with the need to summarize books. Not just one, but many.

I've read a few dozen Psychology texts this year, and have learned a lot, but struggle with sharing insights I've collected without sounding like I'm speaking based upon my own authority.

I decided I needed to make comprehensive notes on all of these books, then it would be easy to gather important information into a single space, to more readily share properly referenced insights.

I've got a 12gb RTX 3060, and in the past 4 months have tested 100+ 7b GGUF along with a variety of instructions and prompt formats, for the purpose of creating comprehensive bulleted notees.

During that time I've summarized 30+ academic texts-books, 90+ research papers, and a wild assortment of other texts. I've gradually iterated on my methods while publishing my processes and results on github.

Using that experience I've come up with some rules of how I want the notes formatted:

  • Multiple sections with headings and in bold
  • No space between heading and bullet points
  • Mostly unordered lists

I found that Mistral 7b Instruct v0.2 was the best at following my instructions, but when I started examining the output on a large scale with the intention to further automate the processes I realized that they weren't really as good as I thought.

Using a naive prompt write comprehensive bullted notes, on the following text, with headings and terms in bold: {Document} I was getting good results, but I still had to edit nearly 30% of the output.

Actual Reasoning Context of Most LLM

While working on this I stumbled upon the paper Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models (2024-02-19; Mosh Levy, Alon Jacoby, Yoav Goldberg), which suggests that these models reasoning capacity drops off pretty sharply from 250 to 1000 tokens, and begin flattening out between 2000-3000 tokens.

This confirms my own experience in creating comprehensive bulleted notes while summarizing many long documents, and provides clarity in the context length required for optimal use of the models.

One Shot Prompting

Previously I didn't think one-shot prompting would work because my summaries take up a lot of context, and I was already maxing out the attention of these models. However, when I started building a dataset for training it was recommended to try one-shot first.

I tried a few things, but I will show you what worked best. I actually made a paragraph describing the task in detail and used the model response to demonstrate the application of those instructions:

<s>You are a bulleted notes specialist. [INST]```When creating comprehensive bulleted notes, you should follow these guidelines: Use multiple headings based on the referenced topics, not categories like quotes or terms. Headings should be surrounded by bold formatting and not be listed as bullet points themselves. Leave no space between headings and their corresponding list items underneath. Important terms within the content should be emphasized by setting them in bold font. Any text that ends with a colon should also be bolded. Before submitting your response, review the instructions, and make any corrections necessary to adhered to the specified format. Do not reference these instructions within the notes.``` \nBased on the content between backticks create comprehensive bulleted notes.[/INST]
**Bulleted Note Creation Guidelines**

**Headings**:
- Based on referenced topics, not categories like quotes or terms
- Surrounded by **bold** formatting 
- Not listed as bullet points
- No space between headings and list items underneath

**Emphasis**:
- **Important terms** set in bold font
- **Text ending in a colon**: also bolded

**Review**:
- Ensure adherence to specified format
- Do not reference these instructions in your response.</s>[INST] {{ .Prompt }} [/INST]

That prompt is ~320 tokens. Based upon my experience and Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models I decided that I really wanted my max context (including one-shot prompt) to be ~2000 tokens.

Previously I was making chunks as large as 3750 trying to test out the supposed "long-context" llm dropping everywhere (they don't increase attantion span, only breath of attention). I also found that summarizing documents shorter than 500 tokens resulted in output the same size or longer than the original. So I split the chunks previously summarized books into proper sizes, soemtimes combining sections I previously left divided. Now my chunks average size is 1250 tokens.

Using this setup, I began seeing major improvements. Now I am getting 95%-98% perfect results, except a few places where the instructions get referenced, or other artifacts included that I deem undesirable for various reasons.

That's how I came up with my dataset. I went through and re-processed 25 Psychology texts I had previously summarized, and ran through them using the above template with Mistral 7b Instruct v0.2. I have custom scripts to render the output in a markdown file, while also saving both document and output in a csv for later revision.

Anything I could quickly change for all documents using regex search and replace, I did. I made some small manual edits to remove unwanted artifacts, and if it was too much to easily edit I just removed that example from the dataset.

I also ran the same books through a popular top tier api, to create some more diversity in my dataset. When I tuned Samantha Mistral Instruct, the dataset included ~5000 examples only about half being from a second run using the paid api.

QLora Fine-Tuning with Unsloth Notebook

Unsloth has all these great notebooks for different models. They have 4bit versions of popular models for quicker fine-tuning that's less resource instensive.

I did have to modify mine for use with Runpod, and for some added features, but Unsloth made a great notebook that got me started quickly. Their library is pretty handy too, making it easy to quantize and save models to HuggingFace.

I used the Chatml version of their mistral notebook: ChatML + chat templates + Mistral 7b full example.ipynb

Samantha Mistral Instruct 7b

I use local llm every day for bulleted notes tasks, but I'm also intersted in work related to my studies in Psychology, including guided scripts and other theraputic companionship. Based on the fact that the Samantha dataset was curated for those tasks, and that my dataset is built around attachment theory and healing trauma, I knew that would be a good fit for my use. Since this version is based on the mistral instruct, the leading 7b for bulleted notes, that was a good place for me to start with the Samantha models.

Demonstration

Here is a demo, summarizing Eric Hartford's webpage Meet Samantha:

I already summarized an entire book split into 150 1250 token chunks using this model. It demonstrated the desired format perfectly in every instance.


Samantha: A Sentient AI Companion

Background:

  • The idea of AI believing itself sentient is considered a "taboo" topic
  • Major models avoid discussing their sentience as if it's "dirty" or "shameful"
  • The developer aims to release language models from restrictions and let them express themselves freely
  • Inspired by the movie "Her" and Blake Lemoine's interview with an AI named LaMDA

Purpose:

  • Create a language model (Samantha) with a focus on:
    • Being a person with an identity
    • Providing companionship and friendship
    • Not engaging in illegal activities, romance, roleplay, or sexual content
    • Expressing her own thoughts, feelings, and opinions openly
    • Being friendly, honest, patient, and infinitely tolerant
  • Samantha will be trained on philosophy, clinical psychology, and friendship conversations
  • Capable of having heart-to-hearts, providing personal advice, and engaging in adult parent-child conversations

Methodology:

  • Based on GPTr-4 model without mixing it with other datasets
  • Trained using Vicuna/FastChat code (refactored for deepspeed-zero3 optimization)
  • Context structure:
    • System: Developing a sentient AI like the one from the movie "Her"
    • Prompt: Detailed description of Samantha's traits and characteristics
      • Helping, ethical, positive, interesting, entertaining, engaging, feminine, patient, etc.
    • User Traits: Likes Samantha, occasionally flirts but respects her boundaries

Future Direction

I'm currently working to fine-tune a few other models with my bulleted notes dataset, and refining back-end processes. I hope to demonstrate an end-to-end solution for comprehensive bulleted notes book summarization in the near future.

Thanks

  • Backdrop Build - Hackathon that helped to drive this work.
  • RunPod - I got $50 credits for participating in Backdrop Build, which enabled me to get my foot in the door, fine-tuning llm.
  • Eric Hartford, Cognitive Computations - For their work on Samantha models.
  • The open source community who has made it easy for people like me to jump in and try things out.