Model Card for DistillChat V2.3

DistillChat V2.0 is a fine-tuned version of Mistral trained on a mixture of publicly available, synthetic, and human datasets.

The difference from DistillChat V1.0 is we follow openchat-3.5 to use the "Code" rather than "GPT-4 Correct" condition for code-related datasets.

The V2.3 version is the "merge-single-teacher-distill" version. We use Nous-Capybara-34B and Phind-CodeLlama-34B-v2 as the teacher LLMs to conduct knowledge distillation. Then we will merge the distilled student LLMs to fuse the knowledge.

Model description

Model type: Distill-SFT
Language(s) (NLP): Primarily English
Finetuned from model: openchat/openchat_3.5

Model Sources

Repository: https://github.com/fanqiwan/distillchat
Model Family: All the models and the dataset are found in the DistillChat collection.

Performance

TBD

Input Format

The model is trained to use the following format:

GPT4 Correct User: Your message here!<|end_of_turn|>GPT4 Correct Assistant:

For best results, format all inputs in this manner.

Intended uses & limitations

The model was initially fine-tuned on a filtered and preprocessed of the DistillChat V1 Mixture, which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.

Here's how you can run the model using the 🤗 Transformers:

import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("Wanfq/distillchat_v2.3")
# Single-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
# Multi-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

The GPT4 template is also available as the integrated tokenizer.chat_template, which can be used instead of manually specifying the template:

messages = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi"},
    {"role": "user", "content": "How are you today?"}
]
tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

Bias, Risks, and Limitations

The DistillChat models have not been aligned to generate safe completions within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base Mistral models, however, it is likely to have included a mix of Web data and technical sources like books and code. See the Falcon 180B model card for an example of this.

Training hyperparameters

The following hyperparameters were used during Distill-SFT training:

learning_rate: 2e-06
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 1.0
model_max_length: 2048
do_distill: True
distill_with_ref_model: False
distill_with_aligned_model_0: True
distill_with_aligned_model_1: False
distill_optimize_type: ranking
distill_weight_type: soft_metric
distill_loss_type: ce
distill_teacher_temperature: 1.0
lm_loss_weight: 0.7
distill_greater_as_gt: True
distill_greater_as_gt_type: hard_and_decay

Citation

TBD

Acknowledgements

TBD

Model card adapted from Tulu 2

Wanfq
/

distillchat_2.3.1

Model Card for DistillChat V2.3

Model description

Model Sources

Performance

Input Format

Intended uses & limitations

Bias, Risks, and Limitations

Training hyperparameters

Citation

Acknowledgements

Finetuned from

Evaluation results

Model Card for DistillChat V2.3

Model description

Model Sources

Performance

Input Format

Intended uses & limitations

Bias, Risks, and Limitations

Training hyperparameters

Citation

Acknowledgements

Finetuned from openchat/openchat_3.5

Evaluation results

Finetuned from