This LLM is good at a lot of things.

#1
by deleted - opened
deleted

I never chat or role play so I have no idea how well this LLM is at what it was primarily designed for.

But at tasks like Q&A, problem solving and story telling it performed as well for me as the leading Mistrals, including Zephyr Beta, Open Hermes 2.5, Dolphin 2.1 and Intel Neural v 3.2.

And it seemed to process my deliberately long and convoluted story prompts better than any other Mistral I tested.

Glad to hear it's working well for you! I'm surprised at how well it turned out, particularly given the amount of PIPPA in there vs. actual instruction examples.

Thanks for giving it a shot.

Hi @chargoddard ,
Could you provide prompt template for chat?

Also DROP score was removed by hf : https://huggingface.co/blog/leaderboard-drop-dive
Your model now has the highest score in 7B models!

deleted

I'm just a user who tests LLMs as a hobby and know very little about fine-tuning or how LLMs work.

But I am starting to pick up on a pattern that when people try to use too many step-by-step instruction examples and prompt pairs from teachers like GPT4 the student LLMs become frustratingly "stubborn". That is, they start ignoring details in the user's prompt to say something it thinks is best, but in most cases isn't true or reasonable.

For example, in the case of story telling, it will make a character use a camera phone even though I explicitly said camcorder for a reason (it was the early 90s and there were no camera phones). Or if I said he caught stealing something like money from a counter the LLM (programmed to ad suspense) would start saying things like 'he heard footsteps coming down the hall' before being caught grabbing the money red handed (people obviously don't grab the money and act surprised about being caught if they hear footsteps first). In short, the training data starts forcing absurd mistakes that blatantly contradict the user prompt, or even what the LLM already wrote.

You may be on to something about finding a balance, adding just enough diverse step-by-step and prompt pairs from teacher to student in order to make smaller LLMs more "intelligent", but not enough to make them stubborn and ignore details in the user prompt and what was already stated in the response in favor of pre-packaged elements like always using camera phones, hearing footsteps to build suspense...

@Nailcan
Here's the format I used for the roleplay training data:

### Instruction:

Enter roleplay mode. You are {character name}.

{character bio goes here}

Example session #1:
```
{character name}: words words ...
User: a reply
```

Example session #2:
```
...
```

### Input:
{character name}: words words words
{user name}: words words more words
{character name}: ...
{user name}: ...

### Response:
{character name}:

This model is a romantic :D

For whatever reason, this is also still the best 7b RAG answer generator I've tried (running 4bit AWQ version from TheBloke). The model seems to

  • follow instructions,
  • reason quite well,
  • produce nice, clean, just right sized outputs, and
  • adapt well when forced to adhere to a certain JSON schema

Something definitely went right with this one.

Sign up or log in to comment