|
--- |
|
license: llama2 |
|
datasets: |
|
- totally-not-an-llm/EverythingLM-data |
|
--- |
|
|
|
*wipwipwip* |
|
|
|
# EverythingLM-13b-16k |
|
|
|
Introducing EverythingLM, a llama-2 based, general-purpose 13b model with 16k context thanks to LlongMa. The model is trained on the EverythingLM dataset, more info can be found on the dataset page. |
|
|
|
The model is completely uncensored. |
|
|
|
This model is an early test of the EverythingLM dataset and some new experimental principles, so don't consider it SOTA. |
|
|
|
### Notable features: |
|
- Automatically triggered CoT reasoning. |
|
- Verbose and detailed replies. |
|
- Creative stories. |
|
- Better prompt understanding. |
|
|
|
### Prompt format: |
|
It is a modified Vicuna format, the same used in many of ehartford's models. |
|
``` |
|
You are a helpful AI assistant. |
|
|
|
USER: <prompt> |
|
ASSISTANT: |
|
``` |
|
|
|
Training took about 1 hour using QLoRa on 1xA100, so this model can be recreated for about $3. QLoRa model can be found here: https://huggingface.co/totally-not-an-llm/EverythingLM-13b-peft. |
|
|
|
### Model quirks: |
|
- Due to the nature of the dataset, it does better with more detail. I've found it gives much better stories when I provide more requirements. |
|
- It really likes to use numbered lists. I don't necessarilly have a problem with this but it's something to note when training on the dataset. |
|
- It likes to write fairy tales over anything else, which is strange. This can easily be fixed by prompting. |
|
- Occasionally it will fall into repetition, this seems to be a commmon issue with llama-2 models. |
|
- Haven't tested pushing it all the way to 16k context. |
|
|
|
### Future plans: |
|
- Native finetune. |
|
- Other model sizes. |
|
- Improve dataset by: |
|
- Regenerating using gpt-4. |
|
- A bit more data with more diversity. |
|
- Refactor dataset generation script. |
|
- Test some model merges using this model. |