Text Generation
Transformers
PyTorch
English
llama
sft
Inference Endpoints
text-generation-inference
andreaskoepf's picture
Update README.md
72f506c
metadata
license: other
  • wand (still internal, needs to be moved to public-sft)
  • checkpoint: 3319 steps

Note

In order to load this model you need to install a pre-release version of the Huggingface transformers library.

Model Configuration

llama2_13b_orca_8k:
  rng_seed: 0xe1291f1a
  use_custom_sampler: true
  sort_by_length: false
  dtype: fp16
  log_dir: "llama2_log_13b_orca_8k"
  learning_rate: 1e-5
  model_name: /mnt/data/llama2/Llama-2-13b-hf/
  output_dir: llama2_13b_orca_8k
  deepspeed_config: configs/zero_config_pretrain.json
  weight_decay: 0.0
  max_length: 8192
  warmup_steps: 100
  use_flash_attention: true
  gradient_checkpointing: true
  gradient_accumulation_steps: 8
  per_device_train_batch_size: 2
  per_device_eval_batch_size: 1
  residual_dropout: 0.0
  eval_steps: 200
  save_steps: 1000  # (total steps: 3319)
  num_train_epochs: 1
  save_total_limit: 4
  superhot: true
  superhot_config:
    type: linear
    scale: 2
  datasets:
    # Dataset Composition:
    # Tain (sampled):
    #   orca-chat: 100.00% (188842)
    #   fanfics: 100.00% (47760)
    #   red_pajama: 25.00% (188262)
    # Valid:
    #   orca-chat: 5000 (71.43%)
    #   fanfics: 1000 (14.29%)
    #   red_pajama: 1000 (14.29%)
    - orca-chat:
        max_val_set: 5000
    - fanfics:
        max_chunk_size: 65535
        max_val_set: 1000
    - red_pajama:
        fraction: 0.25
        max_val_set: 1000
        max_chunk_size: 65535
  peft_model: false