nxnhjrjtbjfzhrovwl
/

limarp-llama2-ggml-f16

Model card Files Files and versions Community

nxnhjrjtbjfzhrovwl commited on Aug 23, 2023

Commit

471cada

1 Parent(s): 9aa97de

Update README.md

Browse files

Files changed (1) hide show

README.md +16 -15

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ The below are the contents of the original model card:
 LIMARP-Llama2 is an experimental [Llama2](https://huggingface.co/meta-llama) finetune narrowly focused on novel-style roleplay chatting.
-To considerably facilitate uploading and distribution, LoRA adapters have been provided instead of the merged models. You should get the Llama2 base model first, either from Meta or from one of the reuploads on HuggingFace (for example [here](https://huggingface.co/NousResearch/Llama-2-7b-hf) and [here](https://huggingface.co/NousResearch/Llama-2-13b-hf)). It is also possible to apply the LoRAs on different Llama2-based models (e.g. [LLongMA-2](https://huggingface.co/conceptofmind/LLongMA-2-7b) or [Nous-Hermes-Llama2](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b)), although this is largely untested and the final results may not work as intended.
 ## Model Details
@@ -81,7 +81,6 @@ And here is a sample of how the model is intended to behave with proper chat and
 ### More detailed notes on prompt format and other settings
 - **The model has been tested mainly using Oobabooga's `text-generation-webui` as a backend**
-- **For somewhat improved compatibility with KoboldAI, this version of the model has been trained _without_ BOS or EOS tokens. They should be disabled in `text-generation-webui`.**
 - Preferably respect spacing and newlines shown above. This might not be possible yet with some front-ends.
 - Replace `Character` and `User` in the above template with your desired names.
 - The model expects the characters to use third-person narration in simple past and enclose dialogues within standard quotation marks `" "`.
@@ -132,21 +131,26 @@ Then, preferably use [SillyTavern](https://github.com/SillyTavern/SillyTavern) a
 ![SillyTavern settings](https://i.imgur.com/gDPC8gx.png)
-**Important! Disable "Add BOS token"**. It is also recommended to enable "Ban EOS Token" and "Skip Special Tokens" (the model does not use them).
-![Disabled BOS and EOS](https://i.imgur.com/9nlmV0q.png)
 To take advantage of this model's larger context length, unlock the context size and set it up to any length up to 4096 tokens, depending on your VRAM constraints.
 ![Unlock context size](https://files.catbox.moe/5vgpjt.png)
 ## Training Details
 ### Training Data
 <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-The training data comprises **1005** manually edited roleplaying conversation threads from various Internet RP forums, for about 11 megabytes of data.
 Character and Scenario information was filled in for every thread with the help of mainly `gpt-4`, but otherwise conversations in the dataset are almost entirely human-generated except for a handful of messages. Character names in the RP stories have been isolated and replaced with standard placeholder strings. Usernames, out-of-context (OOC) messages and personal information have not been intentionally included.
@@ -154,18 +158,15 @@ Character and Scenario information was filled in for every thread with the help
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-[QLoRA](https://arxiv.org/abs/2305.14314) by Dettmers et al. was used to finetune this model on a single consumer GPU.
-#### Training Hyperparameters
 The most important settings for QLoRA were as follows:
-- --dataset-format input-output
-- --train_on_source True
 - --learning_rate 0.00006
 - --lr_scheduler_type cosine
-- --lora_r 32 (7B LoRA), 8 (13B LoRA)
-- --max_steps -1
 - --num_train_epochs 2
 - --bf16 True
 - --bits 4
@@ -175,7 +176,7 @@ The most important settings for QLoRA were as follows:
 An effective batch size of 1 was found to yield the lowest loss curves during fine-tuning.
-It was also found that using `--train_on_source False` with the entire training example at the output yields similar results.
 <!-- ## Evaluation -->
@@ -185,4 +186,4 @@ It was also found that using `--train_on_source False` with the entire training
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Finetuning this model requires about 1 kWh (7B LoRA) or 2.1 kWh (13B LoRA) of electricity for 2 epochs, excluding testing.

 LIMARP-Llama2 is an experimental [Llama2](https://huggingface.co/meta-llama) finetune narrowly focused on novel-style roleplay chatting.
+To considerably facilitate uploading and distribution, LoRA adapters have been provided instead of the merged models. You should get the Llama2 base model first, either from Meta or from one of the reuploads on HuggingFace (for example [here](https://huggingface.co/NousResearch/Llama-2-7b-hf) and [here](https://huggingface.co/NousResearch/Llama-2-13b-hf)). It is also possible to apply the LoRAs on different Llama2-based models, although this is largely untested and the final results may not work as intended.
 ## Model Details
 ### More detailed notes on prompt format and other settings
 - **The model has been tested mainly using Oobabooga's `text-generation-webui` as a backend**
 - Preferably respect spacing and newlines shown above. This might not be possible yet with some front-ends.
 - Replace `Character` and `User` in the above template with your desired names.
 - The model expects the characters to use third-person narration in simple past and enclose dialogues within standard quotation marks `" "`.
 ![SillyTavern settings](https://i.imgur.com/gDPC8gx.png)
 To take advantage of this model's larger context length, unlock the context size and set it up to any length up to 4096 tokens, depending on your VRAM constraints.
 ![Unlock context size](https://files.catbox.moe/5vgpjt.png)
+A previous version of this model was trained _without_ BOS/EOS tokens, but these have now been tentatively added back, so it is not necessary
+to disable them anymore as previously indicated. No significant difference is observed in the outputs after loading the LoRAs with
+regular `transformers`. However, It is still **recommended to disable the EOS token** as it can for instance apparently give [artifacts or tokenization issues](https://files.catbox.moe/cxfrzu.png)
+when it ends up getting generated close to punctuation or quotation marks, at least in SillyTavern. These would typically happen
+with AI responses.
+![Ban EOS](https://files.catbox.moe/xslnhb.png)
 ## Training Details
 ### Training Data
 <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The training data comprises about **1000** manually edited roleplaying conversation threads from various Internet RP forums, for about 11 megabytes of data.
 Character and Scenario information was filled in for every thread with the help of mainly `gpt-4`, but otherwise conversations in the dataset are almost entirely human-generated except for a handful of messages. Character names in the RP stories have been isolated and replaced with standard placeholder strings. Usernames, out-of-context (OOC) messages and personal information have not been intentionally included.
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+The version of LIMARP initially uploaded in this repository was trained using [QLoRA](https://arxiv.org/abs/2305.14314) by Dettmers et al. on a single consumer GPU (RTX3090). Later on, a small NVidia A40 cluster was used and training was performed in 8bit with regular LoRA adapters.
+#### Training Hyperparameters initially used with QLoRA
 The most important settings for QLoRA were as follows:
 - --learning_rate 0.00006
 - --lr_scheduler_type cosine
+- --lora_r 8
 - --num_train_epochs 2
 - --bf16 True
 - --bits 4
 An effective batch size of 1 was found to yield the lowest loss curves during fine-tuning.
+It was also found that using `--train_on_source False` with the entire training example at the output yields similar results. These LoRAs have been trained in this way (similar to what was done with [Guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) or as with unsupervised finetuning).
 <!-- ## Evaluation -->
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Finetuning this model on a single RTX3090-equipped PC requires about 1 kWh (7B) or 2.1 kWh (13B) of electricity for 2 epochs, excluding testing.