Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

redpajama-3b-chat - bnb 4bits

Original model description:

license: cc-by-nc-2.0 language: - en - zh - ja tags: - sft pipeline_tag: text-generation widget: - text: >- <|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|> - text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|> - text: >- <|prompter|>Write a story about future of AI development<|endoftext|><|assistant|> datasets: - OpenAssistant/oasst1 - databricks/databricks-dolly-15k - anon8231489123/ShareGPT_Vicuna_unfiltered - LIUM/tedlium - theblackcat102/joke_explaination

Redpajama-3B SFT model

It is based on a RedPajama's 3B that was fine-tuned on human demonstrations of assistant conversations collected through the https://open-assistant.io/ human feedback web app before April 12, 2023.

supervised finetune on sequence length of 5120

Model Details

Prompting

Two special tokens are used to mark the beginning of user and assistant turns: <|prompter|> and <|assistant|>. Each turn ends with a <|endoftext|> token.

Input prompt example:

<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>

The input ends with the <|assistant|> token to signal that the model should start generating the assistant reply.

Benchmark

model MMLU BBH Humaneval @10
ikala/redpajama-3b-chat 24.6 29.3 4.8
ikala/bloom-zh-3b-chat 31.4 30.2 0.0
llama-7b (reference) 30.9 27.6 10.3

Dev Details

command: deepspeed trainer_sft.py --configs defaults redpajama-3b datasets --num_train_epochs 2 --deepspeed

data:

datasets:
  - wmt2019_zh-en:
      max_val_set: 1000
      max_train_set: 20000
  - ted_trans_en-ja:
      max_val_set: 1000
      max_train_set: 20000
  - ted_trans_zh-ja:
      max_val_set: 1000
      max_train_set: 20000
  - ikala:
      input_file_path: export_conversation_v4.4.jsonl
      val_split: 0.05
  - dolly15k:
      val_split: 0.05
  - oasst_export:
      lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk,zh,ja,th,ko"
      input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
      val_split: 0.05
  - joke
  - gsm8k
  - webgpt

with internal datasets ikala so if you try to reproduce please remove the dataset

redpajama-3b:

redpajama-3b:
  dtype: fp16
  log_dir: "redpajama_3b"
  learning_rate: 1e-5
  model_name: saved_models/RedPajama-INCITE-Base-3B-v1
  output_dir: ikala_v4_3b
  weight_decay: 0.0
  max_length: 8196
  warmup_steps: 2000
  gradient_checkpointing: true
  gradient_accumulation_steps: 32
  per_device_train_batch_size: 1
  per_device_eval_batch_size: 2
  eval_steps: 500
  save_steps: 1000
  num_train_epochs: 8
  save_total_limit: 2
  deepspeed_config: configs/zero3_config_sft.json

zero config:

{
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "bf16": {
    "enabled": "auto"
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "betas": "auto",
      "eps": "auto",
      "weight_decay": "auto"
    }
  },
  "scheduler": {
    "type": "WarmupDecayLR",
    "params": {
      "warmup_min_lr": "auto",
      "warmup_max_lr": "auto",
      "warmup_num_steps": "auto",
      "warmup_type": "linear",
      "total_num_steps": "auto"
    }
  },
  "zero_optimization": {
    "stage": 3,
    "overlap_comm": true,
    "contiguous_gradients": true,
    "sub_group_size": 1e9,
    "reduce_bucket_size": "auto",
    "stage3_prefetch_bucket_size": "auto",
    "stage3_param_persistence_threshold": "auto",
    "stage3_max_live_parameters": 1e9,
    "stage3_max_reuse_distance": 1e9,
    "stage3_gather_16bit_weights_on_model_save": true
  },
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "steps_per_print": 2000,
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "wall_clock_breakdown": false
}
Downloads last month
0
Safetensors
Model size
1.56B params
Tensor type
F32
FP16
U8
Inference API
Unable to determine this model's library. Check the docs .