danlou commited on
Commit
136949b
·
verified ·
1 Parent(s): 9bc2abb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -11,4 +11,29 @@ tags:
11
 
12
  # 📟 Relay v0.1 (Mistral Nemo 2407)
13
 
14
- <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/O1M2VpN9Fg-dYL8Z-vhpb.png" width="600" />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  # 📟 Relay v0.1 (Mistral Nemo 2407)
13
 
14
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/O1M2VpN9Fg-dYL8Z-vhpb.png" width="600" />
15
+
16
+ ## Table of Contents
17
+ - Introduction: LLMs as IRC
18
+ - How to Use
19
+ - Safety Testing
20
+ - Fine-tuning Setup
21
+ - Limitations
22
+
23
+ # Introduction: LLM as IRC
24
+
25
+ Relay is motivated by the following question: What does it take to chat with a base LLM?
26
+
27
+ Several papers (e.g., URIAL) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
28
+
29
+ LLMs can be more than smart assistants. In fact, they should have the potential to assume all sorts of behaviours or patterns found in their pre-training datasets (usually some large chunk of the internet).
30
+
31
+ Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats/logs. IRC is an obvious choice for modelling conversational behaviour, yet it seems mostly neglected. The widespread use of commands with IRC also allows for modelling much more than a 2-way chat simulator.
32
+
33
+ We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see based-chat-v0.1). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
34
+
35
+ Assuming the fine-tuned model is the same used for the synthetic dataset, this conversational model will have been essentially trained with self-supervision (except for conversation starters), no preference data/methods, not even any instructions. The fine-tuning approach is also lightweight: 4-bit QLoRa (see Fine-tuning Setup section).
36
+
37
+ Nevertheless, Relay can be used to simulate more natural conversations (it’s not an assistant), besides several other applications through creative use of commands (see How to Use for more details).
38
+
39
+ Post-training methods are also responsible for ensuring the safety and alignment of LLMs. This is an important concern that was addressed in the development of the based-chat synthetic dataset, and evaluated on the resulting fine-tuned model(see Safety Testing section for more details).