Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,7 @@ library_name: transformers
|
|
22 |
- [License](#license)
|
23 |
- [Citation](#citation)
|
24 |
|
|
|
25 |
## Introduction: LLMs as IRCs
|
26 |
|
27 |
Relay is motivated by this question: What does it take to chat with a base LLM?
|
@@ -37,6 +38,7 @@ Nevertheless, Relay can simulate more natural conversations (it’s not an assis
|
|
37 |
|
38 |
Post-training methods also support the safety and alignment of LLMs. This important concern was also addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
|
39 |
|
|
|
40 |
## How to use
|
41 |
|
42 |
If you have a CUDA GPU (>=12GB VRAM), the best way to use Relay is with the [relaylm.py](https://github.com/danlou/relay/blob/main/relaylm.py) inference script. Just run:
|
@@ -45,7 +47,7 @@ curl https://danlou.co/f/relaylm.py | python -
|
|
45 |
```
|
46 |
|
47 |
This script will select the best model for the available VRAM, download, load, and start an interactive chat session.
|
48 |
-
It does not have any dependencies besides `transformers >= 4.45.1`.
|
49 |
|
50 |
If you want to use a particular model, you can pass the model name as an argument:
|
51 |
```bash
|
@@ -80,6 +82,7 @@ print(favorite_holiday(relay, 'China'))
|
|
80 |
|
81 |
More examples available in the [project's GitHub](https://github.com/danlou/relay).
|
82 |
|
|
|
83 |
## Safety testing
|
84 |
|
85 |
While this model is intended for research purposes, it's still relevant to explore how this conversational model (and its self-supervised approach) compares on safety risk against other conversational models trained on the same base LLM.
|
@@ -92,15 +95,32 @@ As can be seen in the plot below, Relay v0.1 refuses to answer the majority of t
|
|
92 |
|
93 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/0m-dMagE7yKy1V-EB-fJ3.png" width="800"/>
|
94 |
|
95 |
-
It's also worth noting that some refusals are a variation of "I don't know". As seen with all LLMs, bad actors may be able to find ways around refusals.
|
|
|
96 |
|
97 |
## Fine-tuning setup
|
98 |
|
99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
|
101 |
## Limitations
|
102 |
|
103 |
-
|
|
|
|
|
104 |
|
105 |
## License
|
106 |
|
@@ -108,6 +128,7 @@ This model is licensed under [CC-BY-NC 4.0](https://creativecommons.org/licenses
|
|
108 |
While [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) is licensed under Apache 2.0, this Relay fine-tune is trained with a CC-BY-NC 4.0 dataset ([based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)).
|
109 |
The `relaylm.py` script is Apache 2.0.
|
110 |
|
|
|
111 |
## Citation
|
112 |
|
113 |
If you use Relay in your research, please cite it as follows:
|
|
|
22 |
- [License](#license)
|
23 |
- [Citation](#citation)
|
24 |
|
25 |
+
|
26 |
## Introduction: LLMs as IRCs
|
27 |
|
28 |
Relay is motivated by this question: What does it take to chat with a base LLM?
|
|
|
38 |
|
39 |
Post-training methods also support the safety and alignment of LLMs. This important concern was also addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
|
40 |
|
41 |
+
|
42 |
## How to use
|
43 |
|
44 |
If you have a CUDA GPU (>=12GB VRAM), the best way to use Relay is with the [relaylm.py](https://github.com/danlou/relay/blob/main/relaylm.py) inference script. Just run:
|
|
|
47 |
```
|
48 |
|
49 |
This script will select the best model for the available VRAM, download, load, and start an interactive chat session.
|
50 |
+
It does not have any dependencies besides `transformers >= 4.45.1`. You can also download the script manually and then run python, of course.
|
51 |
|
52 |
If you want to use a particular model, you can pass the model name as an argument:
|
53 |
```bash
|
|
|
82 |
|
83 |
More examples available in the [project's GitHub](https://github.com/danlou/relay).
|
84 |
|
85 |
+
|
86 |
## Safety testing
|
87 |
|
88 |
While this model is intended for research purposes, it's still relevant to explore how this conversational model (and its self-supervised approach) compares on safety risk against other conversational models trained on the same base LLM.
|
|
|
95 |
|
96 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/0m-dMagE7yKy1V-EB-fJ3.png" width="800"/>
|
97 |
|
98 |
+
It's also worth noting that some refusals are a variation of "I don't know". As seen with all LLMs, bad actors may be able to find ways around refusals. Please use this model responsibly.
|
99 |
+
|
100 |
|
101 |
## Fine-tuning setup
|
102 |
|
103 |
+
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
104 |
+
|
105 |
+
This model is a merge of [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) with a QLoRA adapter trained on [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407) using axolotl.
|
106 |
+
|
107 |
+
Main details:
|
108 |
+
- Approach: 4-bit QLoRA (rank=8, alpha=16, all linear layers)
|
109 |
+
- Template: ChatML (with `user`/`anon` roles, instead of standard `assistant`/`user`)
|
110 |
+
- Training Time: 9h 6m 12s on a single RTX 4090 (1 epoch)
|
111 |
+
- Train Loss: 0.6633
|
112 |
+
- Eval Loss: 0.6881
|
113 |
+
|
114 |
+
Full details:
|
115 |
+
- Training Config: see [axolotl config](https://github.com/danlou/relay/blob/main/axolotl_configs/relay-v0_1-Mistral-Nemo-Base-2407.yml)
|
116 |
+
- Training Run: see [W&B workspace](https://wandb.ai/danlou/relay-v0-1/runs/c0bz0xal/workspace?nw=nwuserdanlou)
|
117 |
+
|
118 |
|
119 |
## Limitations
|
120 |
|
121 |
+
This is not a typical AI Assistant. It should perform worse on benchmarks compared to instruct variants.
|
122 |
+
QLoRa 4bit fine-tuning may be too coarse for preserving integrity of pre-training knowledge.
|
123 |
+
|
124 |
|
125 |
## License
|
126 |
|
|
|
128 |
While [Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) is licensed under Apache 2.0, this Relay fine-tune is trained with a CC-BY-NC 4.0 dataset ([based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)).
|
129 |
The `relaylm.py` script is Apache 2.0.
|
130 |
|
131 |
+
|
132 |
## Citation
|
133 |
|
134 |
If you use Relay in your research, please cite it as follows:
|