Edit model card

Model Card for DistillChat V2.3

DistillChat V2.0 is a fine-tuned version of Mistral trained on a mixture of publicly available, synthetic, and human datasets.

The difference from DistillChat V1.0 is we follow openchat-3.5 to use the "Code" rather than "GPT-4 Correct" condition for code-related datasets.

The V2.3 version is the "merge-single-teacher-distill" version. We use Nous-Capybara-34B and Phind-CodeLlama-34B-v2 as the teacher LLMs to conduct knowledge distillation. Then we will merge the distilled student LLMs to fuse the knowledge.

Model description

  • Model type: Distill-SFT
  • Language(s) (NLP): Primarily English
  • Finetuned from model: openchat/openchat_3.5

Model Sources

Performance

TBD

Input Format

The model is trained to use the following format:

GPT4 Correct User: Your message here!<|end_of_turn|>GPT4 Correct Assistant:

For best results, format all inputs in this manner.

Intended uses & limitations

The model was initially fine-tuned on a filtered and preprocessed of the DistillChat V1 Mixture, which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.

Here's how you can run the model using the 🤗 Transformers:

import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("Wanfq/distillchat_v2.3")
# Single-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
# Multi-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

The GPT4 template is also available as the integrated tokenizer.chat_template, which can be used instead of manually specifying the template:

messages = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi"},
    {"role": "user", "content": "How are you today?"}
]
tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

Bias, Risks, and Limitations

The DistillChat models have not been aligned to generate safe completions within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base Mistral models, however, it is likely to have included a mix of Web data and technical sources like books and code. See the Falcon 180B model card for an example of this.

Training hyperparameters

The following hyperparameters were used during Distill-SFT training:

  • learning_rate: 2e-06
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1.0
  • model_max_length: 2048
  • do_distill: True
  • distill_with_ref_model: False
  • distill_with_aligned_model_0: True
  • distill_with_aligned_model_1: False
  • distill_optimize_type: ranking
  • distill_weight_type: soft_metric
  • distill_loss_type: ce
  • distill_teacher_temperature: 1.0
  • lm_loss_weight: 0.7
  • distill_greater_as_gt: True
  • distill_greater_as_gt_type: hard_and_decay

Citation

TBD

Acknowledgements

TBD

Model card adapted from Tulu 2

Downloads last month
1

Finetuned from