Edit model card

Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Configs

name: llama
model:
  pretrained_model_name_or_path: 'mistralai/Mistral-7B-v0.1'
  cache_dir: '/juice/scr/scr110/scr/nlp/data/neo/hub/'
  return_dict: true
  quantization: false
  device_map: auto  # null
  low_cpu_mem_usage: true  # false
  torch_dtype: bfloat16
  attn_implementation: eager  # so we can load attention weights
  rope_theta: 10000.0

attention:
  attention_type: hedgehog_llama
  feature_map: softmax_dim
  feature_map_kwargs:
    input_dim: 128
    eps: 1e-12
    # mlp: null  # to set
    fullspace: true
  layer_idx: null  # to set
  learned_kernel: untied_head
  learned_kernel_kwargs:
    feature_dim: 128
    skip_connection: false
    bias: false
    zero_init: false
  tie_qk_kernels: false
  train_qk: true
  peft:
    method: lora
    kwargs:
      r: 8  # 256
      lora_alpha: 16  # 512
      lora_dropout: 0.1  # 0.05
      target_modules: ['self_attn.q_proj', 'self_attn.k_proj']

dataset:
  name: alpaca_clean
  dataset_config:
    name: alpaca
    path: yahma/alpaca-cleaned
    chunk_size: 1024  # 2048
    concat_data: true
    cache_dir: '/u/scr/nlp/data/alpaca'
  pretrained_model_config:
    pretrained_model_name_or_path: 'mistralai/Mistral-7B-v0.1'
    cache_dir: '/juice/scr/scr110/scr/nlp/data/neo/hub/'
  preprocess_config: null

dataloader:
  batch_size: 1
  num_workers: 2
  drop_last: false
  pin_memory: true

optimizer:
  optim: adamw_torch_fused
  lr: 0.001
  weight_decay: 0.0

lr_scheduler:
  lr_scheduler_type: reduce_lr_on_plateau
  mode: min
  factor: 0.1
  patience: 10
  min_lr: 0.00001

trainer:  # HuggingFace Trainer-like arguments  
  name: distill_attention
  token_reduce: true
  bottom_attention_only: false
  reverse_kl: false
  
  bf16: true
  train_split: train
  val_split: validation
  num_train_epochs: 2
  gradient_accumulation_steps: 8
  seed: 42
  batch_size: 1
  load_best_model_at_end: true
  greater_is_better: false
  metric_for_best_model: distill/eval/loss
  logging_steps: 100
  evaluation_strategy: steps
  max_steps: -1
  eval_steps: 100
  max_eval_batches: null

dataset:
  name: alpaca_clean
  dataset_config:
    name: alpaca
    path: yahma/alpaca-cleaned
    chunk_size: 1024  # 2048
    concat_data: true
    cache_dir: '/u/scr/nlp/data/alpaca'
  pretrained_model_config:
    pretrained_model_name_or_path: 'mistralai/Mistral-7B-v0.1'
    cache_dir: '/juice/scr/scr110/scr/nlp/data/neo/hub/'
  preprocess_config: null    

dataloader:
  batch_size: 1
  num_workers: 2
  drop_last: false
  pin_memory: true

optimizer:
  optim: adamw_torch_fused
  lr: 1e-4
  weight_decay: 0.0

lr_scheduler:
  lr_scheduler_type: reduce_lr_on_plateau
  mode: min
  factor: 0.1
  patience: 10
  min_lr: 0.00001

trainer:  # HuggingFace Trainer-like arguments  
  name: default
  bf16: true
  train_split: train
  val_split: validation
  num_train_epochs: 2
  gradient_accumulation_steps: 8
  seed: 42
  batch_size: 1
  load_best_model_at_end: true
  greater_is_better: false
  metric_for_best_model: eval/loss  # eval/rouge/geometric_mean
  logging_steps: 100
  evaluation_strategy: steps
  max_steps: -1
  eval_steps: 100
  max_eval_batches: null

finetune:
  method: lora
  kwargs:
    r: 8
    lora_alpha: 16  # 32
    lora_dropout: 0  # 0.05
    target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']

Model Description

  • Developed by: [More Information Needed]
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Model type: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month
2,315
Safetensors
Model size
7.28B params
Tensor type
BF16
·

Dataset used to train mzio/hedgehog-mistral_7b-alpaca_clean-smd_lora_1e_3