danlou
/

relay-v0.1-Mistral-Nemo-2407

@@ -29,13 +29,13 @@ Relay is motivated by this question: What does it take to chat with a base LLM?
 Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
 LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
-Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats/logs. IRC provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
 We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
-Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no preference datasets/methods, not even any instruct-tuning. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning Setup](#fine-tuning-setup)).
 Nevertheless, Relay can simulate more natural conversations (it’s not an assistant), besides several other applications through creative use of commands (see [How to use](#how-to-use)).
-Post-training methods also support the safety and alignment of LLMs. This is an important concern that was addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
 ## How to use

 Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
 LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
+Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats. IRC provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
 We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
+Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no instruct datasets or reward methods. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning Setup](#fine-tuning-setup)).
 Nevertheless, Relay can simulate more natural conversations (it’s not an assistant), besides several other applications through creative use of commands (see [How to use](#how-to-use)).
+Post-training methods also support the safety and alignment of LLMs. This important concern was also addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
 ## How to use