danlou
/

relay-v0.1-Mistral-Nemo-2407

@@ -7,6 +7,7 @@ base_model:
 pipeline_tag: text-generation
 tags:
 - axolotl
 ---
 # 📟 Relay v0.1 (Mistral Nemo 2407)
@@ -14,26 +15,45 @@ tags:
 <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/O1M2VpN9Fg-dYL8Z-vhpb.png" width="600" />
 ## Table of Contents
-- Introduction: LLMs as IRC
-- How to Use
-- Safety Testing
-- Fine-tuning Setup
-- Limitations
-# Introduction: LLM as IRC
-Relay is motivated by the following question: What does it take to chat with a base LLM?
-Several papers (e.g., URIAL) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
-LLMs can be more than smart assistants. In fact, they should have the potential to assume all sorts of behaviours or patterns found in their pre-training datasets (usually some large chunk of the internet).
-Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats/logs. IRC is an obvious choice for modelling conversational behaviour, yet it seems mostly neglected. The widespread use of commands with IRC also allows for modelling much more than a 2-way chat simulator.
-We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see based-chat-v0.1). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
-Assuming the fine-tuned model is the same used for the synthetic dataset, this conversational model will have been essentially trained with self-supervision (except for conversation starters), no preference data/methods, not even any instructions. The fine-tuning approach is also lightweight: 4-bit QLoRa (see Fine-tuning Setup section).
-Nevertheless, Relay can be used to simulate more natural conversations (it’s not an assistant), besides several other applications through creative use of commands (see How to Use for more details).
-Post-training methods are also responsible for ensuring the safety and alignment of LLMs. This is an important concern that was addressed in the development of the based-chat synthetic dataset, and evaluated on the resulting fine-tuned model(see Safety Testing section for more details).

 pipeline_tag: text-generation
 tags:
 - axolotl
+library_name: transformers
 ---
 # 📟 Relay v0.1 (Mistral Nemo 2407)
 <img src="https://cdn-uploads.huggingface.co/production/uploads/60f808c5c1adf9100f1f263c/O1M2VpN9Fg-dYL8Z-vhpb.png" width="600" />
 ## Table of Contents
+- [Introduction: LLMs as IRC](#introduction:-llm-as-irc)
+- [How to use](#how-to-use)
+- [Safety testing](#safety-testing)
+- [Fine-tuning setup](#fine-tuning-setup)
+- [Limitations](#limitations)
+- [License](#license)
+## Introduction: LLM as IRC
+Relay is motivated by this question: What does it take to chat with a base LLM?
+Several papers (e.g., [URIAL](https://arxiv.org/abs/2312.01552)) have shown that base models can be used more reliably than expected. At the same time, we also increasingly find that RLHF, and other post-training approaches, may limit the creativity of LLMs.
+LLMs can be more than smart assistants. In fact, they should have the potential to emulate all sorts of behaviours or patterns found in their pre-training datasets (usually a large chunk of the internet).
+Relay is focused on a particular pattern that should be relatively frequent in pre-training datasets: IRC chats/logs. IRC provides a rich context for conversational modeling, combining natural dialogue with command-based interactions. Yet, it remains largely overlooked.
+We found that base LLMs, as small as 12B, can be sufficiently familiar with the basic formatting of IRC to enable the generation of synthetic conversational datasets (see [based-chat-v0.1](https://huggingface.co/datasets/danlou/based-chat-v0.1-Mistral-Nemo-Base-2407)). These synthetic conversations can then be used to fine-tune LLMs towards unlocking reliable turn-based dialogue, within an implicit IRC context that supports use of commands as well.
+Assuming the model used for fine-tuning is the same used for the synthetic dataset, this conversational model is essentially trained with self-supervision (except for conversation starters): no preference datasets/methods, not even any instruct-tuning. The fine-tuning approach is also lightweight: 4-bit QLoRa (see [Fine-tuning Setup](#fine-tuning-setup)).
+Nevertheless, Relay can simulate more natural conversations (it’s not an assistant), besides several other applications through creative use of commands (see [How to use](#how-to-use)).
+Post-training methods also support the safety and alignment of LLMs. This is an important concern that was addressed in the development of the based-chat synthetic dataset, and tested for with the resulting fine-tuned model (see [Safety testing](#safety-testing)).
+## How to use
+TODO
+## Safety testing
+TODO
+## Fine-tuning setup
+TODO
+## Limitations
+TODO
+## License
+TODO