Update README.md
Browse files
README.md
CHANGED
@@ -7,6 +7,7 @@ base_model:
|
|
7 |
pipeline_tag: text-generation
|
8 |
tags:
|
9 |
- axolotl
|
|
|
10 |
---
|
11 |
|
12 |
# 📟 Relay v0.1 (Mistral Nemo 2407)
|
@@ -14,26 +15,45 @@ tags:
|
|
14 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/O1M2VpN9Fg-dYL8Z-vhpb.png" width="600" />
|
15 |
|
16 |
## Table of Contents
|
17 |
-
- Introduction: LLMs as IRC
|
18 |
-
- How to
|
19 |
-
- Safety
|
20 |
-
- Fine-tuning
|
21 |
-
- Limitations
|
|
|
22 |
|
23 |
-
|
24 |
|
25 |
-
Relay is motivated by
|
26 |
|
27 |
-
Several papers (e.g., URIAL) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
|
|
|
28 |
|
29 |
-
|
|
|
30 |
|
31 |
-
|
|
|
32 |
|
33 |
-
|
34 |
|
35 |
-
|
36 |
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
-
Post-training methods are also responsible for ensuring the safety and alignment of LLMs. This is an important concern that was addressed in the development of the based-chat synthetic dataset, and evaluated on the resulting fine-tuned model(see Safety Testing section for more details).
|
|
|
7 |
pipeline_tag: text-generation
|
8 |
tags:
|
9 |
- axolotl
|
10 |
+
library_name: transformers
|
11 |
---
|
12 |
|
13 |
# 📟 Relay v0.1 (Mistral Nemo 2407)
|
|
|
15 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/O1M2VpN9Fg-dYL8Z-vhpb.png" width="600" />
|
16 |
|
17 |
## Table of Contents
|
18 |
+
- [Introduction: LLMs as IRC](#introduction:-llm-as-irc)
|
19 |
+
- [How to use](#how-to-use)
|
20 |
+
- [Safety testing](#safety-testing)
|
21 |
+
- [Fine-tuning setup](#fine-tuning-setup)
|
22 |
+
- [Limitations](#limitations)
|
23 |
+
- [License](#license)
|
24 |
|
25 |
+
## Introduction: LLM as IRC
|
26 |
|
27 |
+
Relay is motivated by this question: What does it take to chat with a base LLM?
|
28 |
|
29 |
+
Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
|
30 |
+
LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
|
31 |
|
32 |
+
Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats/logs. IRC provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
|
33 |
+
We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
|
34 |
|
35 |
+
Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no preference datasets/methods, not even any instruct-tuning. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning Setup](#fine-tuning-setup)).
|
36 |
+
Nevertheless, Relay can simulate more natural conversations (it’s not an assistant), besides several other applications through creative use of commands (see [How to use](#how-to-use)).
|
37 |
|
38 |
+
Post-training methods also support the safety and alignment of LLMs. This is an important concern that was addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
|
39 |
|
40 |
+
## How to use
|
41 |
|
42 |
+
TODO
|
43 |
+
|
44 |
+
## Safety testing
|
45 |
+
|
46 |
+
TODO
|
47 |
+
|
48 |
+
## Fine-tuning setup
|
49 |
+
|
50 |
+
TODO
|
51 |
+
|
52 |
+
## Limitations
|
53 |
+
|
54 |
+
TODO
|
55 |
+
|
56 |
+
## License
|
57 |
+
|
58 |
+
TODO
|
59 |
|
|