danlou
/

relay-v0.1-Mistral-Nemo-2407

@@ -22,6 +22,7 @@ library_name: transformers
 - [License](#license)
 - [Citation](#citation)
 ## Introduction: LLMs as IRCs
 Relay is motivated by this question: What does it take to chat with a base LLM?
@@ -37,6 +38,7 @@ Nevertheless, Relay can simulate more natural conversations (it’s not an assis
 Post-training methods also support the safety and alignment of LLMs. This important concern was also addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
 ## How to use
 If you have a CUDA GPU (>=12GB VRAM), the best way to use Relay is with the [relaylm.py](https://github.com/danlou/relay/blob/main/relaylm.py) inference script. Just run:
@@ -45,7 +47,7 @@ curl https://danlou.co/f/relaylm.py | python -
 ```
 This script will select the best model for the available VRAM, download, load, and start an interactive chat session.
-It does not have any dependencies besides `transformers >= 4.45.1`.
 If you want to use a particular model, you can pass the model name as an argument:
 ```bash
@@ -80,6 +82,7 @@ print(favorite_holiday(relay, 'China'))
 More examples available in the [project's GitHub](https://github.com/danlou/relay).
 ## Safety testing
 While this model is intended for research purposes, it's still relevant to explore how this conversational model (and its self-supervised approach) compares on safety risk against other conversational models trained on the same base LLM.
@@ -92,15 +95,32 @@ As can be seen in the plot below, Relay v0.1 refuses to answer the majority of t
 <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/0m-dMagE7yKy1V-EB-fJ3.png" width="800"/>
-It's also worth noting that some refusals are a variation of "I don't know". As seen with all LLMs, bad actors may be able to find ways around refusals.
 ## Fine-tuning setup
-TODO
 ## Limitations
-TODO
 ## License
@@ -108,6 +128,7 @@ This model is licensed under [CC-BY-NC 4.0](https://creativecommons.org/licenses
 While [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) is licensed under Apache 2.0, this Relay fine-tune is trained with a CC-BY-NC 4.0 dataset ([based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)).
 The `relaylm.py` script is Apache 2.0.
 ## Citation
 If you use Relay in your research, please cite it as follows:

 - [License](#license)
 - [Citation](#citation)
 ## Introduction: LLMs as IRCs
 Relay is motivated by this question: What does it take to chat with a base LLM?
 Post-training methods also support the safety and alignment of LLMs. This important concern was also addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
 ## How to use
 If you have a CUDA GPU (>=12GB VRAM), the best way to use Relay is with the [relaylm.py](https://github.com/danlou/relay/blob/main/relaylm.py) inference script. Just run:
 ```
 This script will select the best model for the available VRAM, download, load, and start an interactive chat session.
+It does not have any dependencies besides `transformers >= 4.45.1`. You can also download the script manually and then run python, of course.
 If you want to use a particular model, you can pass the model name as an argument:
 ```bash
 More examples available in the [project's GitHub](https://github.com/danlou/relay).
 ## Safety testing
 While this model is intended for research purposes, it's still relevant to explore how this conversational model (and its self-supervised approach) compares on safety risk against other conversational models trained on the same base LLM.
 <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/0m-dMagE7yKy1V-EB-fJ3.png" width="800"/>
+It's also worth noting that some refusals are a variation of "I don't know". As seen with all LLMs, bad actors may be able to find ways around refusals. Please use this model responsibly.
 ## Fine-tuning setup
+[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
+This model is a merge of [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) with a QLoRA adapter trained on [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407) using axolotl.
+Main details:
+- Approach: 4-bit QLoRA (rank=8, alpha=16, all linear layers)
+- Template: ChatML (with `user`/`anon` roles, instead of standard `assistant`/`user`)
+- Training Time: 9h 6m 12s on a single RTX 4090 (1 epoch)
+- Train Loss: 0.6633
+- Eval Loss: 0.6881
+Full details:
+- Training Config: see [axolotl config](https://github.com/danlou/relay/blob/main/axolotl_configs/relay-v0_1-Mistral-Nemo-Base-2407.yml)
+- Training Run: see [W&B workspace](https://wandb.ai/danlou/relay-v0-1/runs/c0bz0xal/workspace?nw=nwuserdanlou)
 ## Limitations
+This is not a typical AI Assistant. It should perform worse on benchmarks compared to instruct variants.
+QLoRa 4bit fine-tuning may be too coarse for preserving integrity of pre-training knowledge.
 ## License
 While [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) is licensed under Apache 2.0, this Relay fine-tune is trained with a CC-BY-NC 4.0 dataset ([based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)).
 The `relaylm.py` script is Apache 2.0.
 ## Citation
 If you use Relay in your research, please cite it as follows: