danlou commited on
Commit
e999315
·
verified ·
1 Parent(s): 136949b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -14
README.md CHANGED
@@ -7,6 +7,7 @@ base_model:
7
  pipeline_tag: text-generation
8
  tags:
9
  - axolotl
 
10
  ---
11
 
12
  # 📟 Relay v0.1 (Mistral Nemo 2407)
@@ -14,26 +15,45 @@ tags:
14
  <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/O1M2VpN9Fg-dYL8Z-vhpb.png" width="600" />
15
 
16
  ## Table of Contents
17
- - Introduction: LLMs as IRC
18
- - How to Use
19
- - Safety Testing
20
- - Fine-tuning Setup
21
- - Limitations
 
22
 
23
- # Introduction: LLM as IRC
24
 
25
- Relay is motivated by the following question: What does it take to chat with a base LLM?
26
 
27
- Several papers (e.g., URIAL) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
 
28
 
29
- LLMs can be more than smart assistants. In fact, they should have the potential to assume all sorts of behaviours or patterns found in their pre-training datasets (usually some large chunk of the internet).
 
30
 
31
- Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats/logs. IRC is an obvious choice for modelling conversational behaviour, yet it seems mostly neglected. The widespread use of commands with IRC also allows for modelling much more than a 2-way chat simulator.
 
32
 
33
- We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see based-chat-v0.1). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
34
 
35
- Assuming the fine-tuned model is the same used for the synthetic dataset, this conversational model will have been essentially trained with self-supervision (except for conversation starters), no preference data/methods, not even any instructions. The fine-tuning approach is also lightweight: 4-bit QLoRa (see Fine-tuning Setup section).
36
 
37
- Nevertheless, Relay can be used to simulate more natural conversations (it’s not an assistant), besides several other applications through creative use of commands (see How to Use for more details).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
- Post-training methods are also responsible for ensuring the safety and alignment of LLMs. This is an important concern that was addressed in the development of the based-chat synthetic dataset, and evaluated on the resulting fine-tuned model(see Safety Testing section for more details).
 
7
  pipeline_tag: text-generation
8
  tags:
9
  - axolotl
10
+ library_name: transformers
11
  ---
12
 
13
  # 📟 Relay v0.1 (Mistral Nemo 2407)
 
15
  <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/O1M2VpN9Fg-dYL8Z-vhpb.png" width="600" />
16
 
17
  ## Table of Contents
18
+ - [Introduction: LLMs as IRC](#introduction:-llm-as-irc)
19
+ - [How to use](#how-to-use)
20
+ - [Safety testing](#safety-testing)
21
+ - [Fine-tuning setup](#fine-tuning-setup)
22
+ - [Limitations](#limitations)
23
+ - [License](#license)
24
 
25
+ ## Introduction: LLM as IRC
26
 
27
+ Relay is motivated by this question: What does it take to chat with a base LLM?
28
 
29
+ Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
30
+ LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
31
 
32
+ Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats/logs. IRC provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
33
+ We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
34
 
35
+ Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no preference datasets/methods, not even any instruct-tuning. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning Setup](#fine-tuning-setup)).
36
+ Nevertheless, Relay can simulate more natural conversations (it’s not an assistant), besides several other applications through creative use of commands (see [How to use](#how-to-use)).
37
 
38
+ Post-training methods also support the safety and alignment of LLMs. This is an important concern that was addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
39
 
40
+ ## How to use
41
 
42
+ TODO
43
+
44
+ ## Safety testing
45
+
46
+ TODO
47
+
48
+ ## Fine-tuning setup
49
+
50
+ TODO
51
+
52
+ ## Limitations
53
+
54
+ TODO
55
+
56
+ ## License
57
+
58
+ TODO
59