Edit model card

Optimum Habana is the interface between the Hugging Face Transformers and Diffusers libraries and Habana's Gaudi processor (HPU). It provides a set of tools enabling easy and fast model loading, training and inference on single- and multi-HPU settings for different downstream tasks. Learn more about how to take advantage of the power of Habana HPUs to train and deploy Transformers and Diffusers models at hf.co/hardware/habana.

Llama model HPU configuration

This model only contains the GaudiConfig file for running Falcon models on Habana's Gaudi processors (HPU).

This model contains no model weights, only a GaudiConfig.

This enables to specify:

  • use_fused_adam: whether to use Habana's custom AdamW implementation
  • use_fused_clip_norm: whether to use Habana's fused gradient norm clipping operator
  • use_torch_autocast: whether to use PyTorch's autocast mixed precision

Usage

The model is instantiated the same way as in the Transformers library. The only difference is that there are a few new training arguments specific to HPUs.

Here is a causal language modeling example script to pre-train/fine-tune a model. You can run it with Falcon with the following command:

LOWER_LIST=ops_bf16.txt python3 run_lora_clm.py \
    --model_name_or_path tiiuae/falcon-40b \
    --dataset_name timdettmers/openassistant-guanaco \
    --bf16 True \
    --output_dir ./model_lora_falcon \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "no" \
    --learning_rate 3e-4 \
    --max_grad_norm  0.3 \
    --warmup_ratio  0.03 \
    --lr_scheduler_type "constant" \
    --logging_steps 1 \
    --do_train \
    --use_habana \
    --use_lazy_mode \
    --pipelining_fwd_bwd \
    --throughput_warmup_steps 3 \
    --lora_rank=64 \
    --lora_alpha=16 \
    --lora_dropout=0.1 \
    --lora_target_modules "query_key_value" "dense" "dense_h_to_4h" "dense_4h_to_h" \
    --dataset_concatenation \
    --max_seq_length 256 \
    --low_cpu_mem_usage True \
    --adam_epsilon 1e-08 \
    --do_eval \
    --validation_split_percentage 5

You will need to install the PEFT library with pip install peft to run the command above.

Check the documentation out for more advanced usage and examples.

Downloads last month
32
Unable to determine this model's library. Check the docs .