LLM Finetuning

With AutoTrain, you can easily finetune large language models (LLMs) on your own data!

AutoTrain supports the following types of LLM finetuning:

Causal Language Modeling (CLM)
Masked Language Modeling (MLM) [Coming Soon]

Data Preparation

LLM finetuning accepts data in CSV format.

Data Format For SFT / Generic Trainer

For SFT / Generic Trainer, the data should be in the following format:

text
human: hello \n bot: hi nice to meet you
human: how are you \n bot: I am fine
human: What is your name? \n bot: My name is Mary
human: Which is the best programming language? \n bot: Python

An example dataset for this format can be found here: https://huggingface.co/datasets/timdettmers/openassistant-guanaco

For SFT/Generic training, your dataset must have a text column

Data Format For Reward Trainer

For Reward Trainer, the data should be in the following format:

text	rejected_text
human: hello \n bot: hi nice to meet you	human: hello \n bot: leave me alone
human: how are you \n bot: I am fine	human: how are you \n bot: I am not fine
human: What is your name? \n bot: My name is Mary	human: What is your name? \n bot: Whats it to you?
human: Which is the best programming language? \n bot: Python	human: Which is the best programming language? \n bot: Javascript

For Reward Trainer, your dataset must have a text column (aka chosen text) and a rejected_text column.

Data Format For DPO Trainer

For DPO Trainer, the data should be in the following format:

prompt	text	rejected_text
hello	hi nice to meet you	leave me alone
how are you	I am fine	I am not fine
What is your name?	My name is Mary	Whats it to you?
What is your name?	My name is Mary	I dont have a name
Which is the best programming language?	Python	Javascript
Which is the best programming language?	Python	C++
Which is the best programming language?	Java	C++

For DPO Trainer, your dataset must have a prompt column, a text column (aka chosen text) and a rejected_text column.

For all tasks, you can use both CSV and JSONL files!

Parameters

❯ autotrain llm --help
usage: autotrain <command> [<args>] llm [-h] [--train] [--deploy] [--inference] [--username USERNAME]
                                        [--backend {local-cli,spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}]
                                        [--token TOKEN] [--push-to-hub] --model MODEL --project-name PROJECT_NAME [--data-path DATA_PATH]
                                        [--train-split TRAIN_SPLIT] [--valid-split VALID_SPLIT] [--batch-size BATCH_SIZE] [--seed SEED]
                                        [--epochs EPOCHS] [--gradient_accumulation GRADIENT_ACCUMULATION] [--disable_gradient_checkpointing]
                                        [--lr LR] [--log {none,wandb,tensorboard}] [--text_column TEXT_COLUMN]
                                        [--rejected_text_column REJECTED_TEXT_COLUMN] [--prompt-text-column PROMPT_TEXT_COLUMN]
                                        [--model-ref MODEL_REF] [--warmup_ratio WARMUP_RATIO] [--optimizer OPTIMIZER] [--scheduler SCHEDULER]
                                        [--weight_decay WEIGHT_DECAY] [--max_grad_norm MAX_GRAD_NORM] [--add_eos_token] [--block_size BLOCK_SIZE]
                                        [--peft] [--lora_r LORA_R] [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT]
                                        [--logging_steps LOGGING_STEPS] [--evaluation_strategy {epoch,steps,no}]
                                        [--save_total_limit SAVE_TOTAL_LIMIT] [--save_strategy {epoch,steps}] [--auto_find_batch_size]
                                        [--mixed_precision {fp16,bf16,None}] [--quantization {int4,int8,None}] [--model_max_length MODEL_MAX_LENGTH]
                                        [--max_prompt_length MAX_PROMPT_LENGTH] [--max_completion_length MAX_COMPLETION_LENGTH]
                                        [--trainer {default,dpo,sft,orpo,reward}] [--target_modules TARGET_MODULES] [--merge_adapter]
                                        [--use_flash_attention_2] [--dpo-beta DPO_BETA] [--chat_template {tokenizer,chatml,zephyr,None}]
                                        [--padding {left,right,None}]

✨ Run AutoTrain LLM

options:
  -h, --help            show this help message and exit
  --train               Command to train the model
  --deploy              Command to deploy the model (limited availability)
  --inference           Command to run inference (limited availability)
  --username USERNAME   Hugging Face Hub Username
  --backend {local-cli,spaces-a10gl,spaces-a10gs,spaces-a100,spaces-t4m,spaces-t4s,spaces-cpu,spaces-cpuf}
                        Backend to use: default or spaces. Spaces backend requires push_to_hub & username. Advanced users only.
  --token TOKEN         Your Hugging Face API token. Token must have write access to the model hub.
  --push-to-hub         Push to hub after training will push the trained model to the Hugging Face model hub.
  --model MODEL         Base model to use for training
  --project-name PROJECT_NAME
                        Output directory / repo id for trained model (must be unique on hub)
  --data-path DATA_PATH
                        Train dataset to use. When using cli, this should be a directory path containing training and validation data in appropriate
                        formats
  --train-split TRAIN_SPLIT
                        Train dataset split to use
  --valid-split VALID_SPLIT
                        Validation dataset split to use
  --batch-size BATCH_SIZE, --train-batch-size BATCH_SIZE
                        Training batch size to use
  --seed SEED           Random seed for reproducibility
  --epochs EPOCHS       Number of training epochs
  --gradient_accumulation GRADIENT_ACCUMULATION, --gradient-accumulation GRADIENT_ACCUMULATION
                        Gradient accumulation steps
  --disable_gradient_checkpointing, --disable-gradient-checkpointing, --disable-gc
                        Disable gradient checkpointing
  --lr LR               Learning rate
  --log {none,wandb,tensorboard}
                        Use experiment tracking
  --text_column TEXT_COLUMN, --text-column TEXT_COLUMN
                        Specify the dataset column to use for text data. This parameter is essential for models processing textual information.
                        Default is 'text'.
  --rejected_text_column REJECTED_TEXT_COLUMN, --rejected-text-column REJECTED_TEXT_COLUMN
                        Define the column to use for storing rejected text entries, which are typically entries that do not meet certain criteria
                        for processing. Default is 'rejected'. Used only for orpo, dpo and reward trainerss
  --prompt-text-column PROMPT_TEXT_COLUMN, --prompt-text-column PROMPT_TEXT_COLUMN
                        Identify the column that contains prompt text for tasks requiring contextual inputs, such as conversation or completion
                        generation. Default is 'prompt'. Used only for dpo trainer
  --model-ref MODEL_REF
                        Reference model to use for DPO when not using PEFT
  --warmup_ratio WARMUP_RATIO, --warmup-ratio WARMUP_RATIO
                        Set the proportion of training allocated to warming up the learning rate, which can enhance model stability and performance
                        at the start of training. Default is 0.1
  --optimizer OPTIMIZER
                        Choose the optimizer algorithm for training the model. Different optimizers can affect the training speed and model
                        performance. 'adamw_torch' is used by default.
  --scheduler SCHEDULER
                        Select the learning rate scheduler to adjust the learning rate based on the number of epochs. 'linear' decreases the
                        learning rate linearly from the initial lr set. Default is 'linear'. Try 'cosine' for a cosine annealing schedule.
  --weight_decay WEIGHT_DECAY, --weight-decay WEIGHT_DECAY
                        Define the weight decay rate for regularization, which helps prevent overfitting by penalizing larger weights. Default is
                        0.0
  --max_grad_norm MAX_GRAD_NORM, --max-grad-norm MAX_GRAD_NORM
                        Set the maximum norm for gradient clipping, which is critical for preventing gradients from exploding during
                        backpropagation. Default is 1.0.
  --add_eos_token, --add-eos-token
                        Toggle whether to automatically add an End Of Sentence (EOS) token at the end of texts, which can be critical for certain
                        types of models like language models. Only used for `default` trainer
  --block_size BLOCK_SIZE, --block-size BLOCK_SIZE
                        Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
                        -1 determines block size automatically. Default is -1.
  --peft, --use-peft    Enable LoRA-PEFT
  --lora_r LORA_R, --lora-r LORA_R
                        Set the 'r' parameter for Low-Rank Adaptation (LoRA). Default is 16.
  --lora_alpha LORA_ALPHA, --lora-alpha LORA_ALPHA
                        Specify the 'alpha' parameter for LoRA. Default is 32.
  --lora_dropout LORA_DROPOUT, --lora-dropout LORA_DROPOUT
                        Set the dropout rate within the LoRA layers to help prevent overfitting during adaptation. Default is 0.05.
  --logging_steps LOGGING_STEPS, --logging-steps LOGGING_STEPS
                        Determine how often to log training progress in terms of steps. Setting it to '-1' determines logging steps automatically.
  --evaluation_strategy {epoch,steps,no}, --evaluation-strategy {epoch,steps,no}
                        Choose how frequently to evaluate the model's performance, with 'epoch' as the default, meaning at the end of each training
                        epoch
  --save_total_limit SAVE_TOTAL_LIMIT, --save-total-limit SAVE_TOTAL_LIMIT
                        Limit the total number of saved model checkpoints to manage disk usage effectively. Default is to save only the latest
                        checkpoint
  --save_strategy {epoch,steps}, --save-strategy {epoch,steps}
                        Define the checkpoint saving strategy, with 'epoch' as the default, saving checkpoints at the end of each training epoch.
  --auto_find_batch_size, --auto-find-batch-size
                        Automatically determine the optimal batch size based on system capabilities to maximize efficiency.
  --mixed_precision {fp16,bf16,None}, --mixed-precision {fp16,bf16,None}
                        Choose the precision mode for training to optimize performance and memory usage. Options are 'fp16', 'bf16', or None for
                        default precision. Default is None.
  --quantization {int4,int8,None}, --quantization {int4,int8,None}
                        Choose the quantization level to reduce model size and potentially increase inference speed. Options include 'int4', 'int8',
                        or None. Enabling requires --peft
  --model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
                        Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
                        Default is 1024
  --max_prompt_length MAX_PROMPT_LENGTH, --max-prompt-length MAX_PROMPT_LENGTH
                        Specify the maximum length for prompts used in training, particularly relevant for tasks requiring initial contextual input.
                        Used only for `orpo` trainer.
  --max_completion_length MAX_COMPLETION_LENGTH, --max-completion-length MAX_COMPLETION_LENGTH
                        Completion length to use, for orpo: encoder-decoder models only
  --trainer {default,dpo,sft,orpo,reward}
                        Trainer type to use
  --target_modules TARGET_MODULES, --target-modules TARGET_MODULES
                        Identify specific modules within the model architecture to target with adaptations or optimizations, such as LoRA. Comma
                        separated list of module names. Default is 'all-linear'.
  --merge_adapter, --merge-adapter
                        Use this flag to merge PEFT adapter with the model
  --use_flash_attention_2, --use-flash-attention-2, --use-fa2
                        Use flash attention 2
  --dpo-beta DPO_BETA, --dpo-beta DPO_BETA
                        Beta for DPO trainer
  --chat_template {tokenizer,chatml,zephyr,None}, --chat-template {tokenizer,chatml,zephyr,None}
                        Apply a specific template for chat-based interactions, with options including 'tokenizer', 'chatml', 'zephyr', or None. This
                        setting can shape the model's conversational behavior.
  --padding {left,right,None}, --padding {left,right,None}
                        Specify the padding direction for sequences, critical for models sensitive to input alignment. Options include 'left',
                        'right', or None

< > Update on GitHub