AutoTrain documentation

Text Classification & Regression Parameters

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.8.21).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Text Classification & Regression Parameters

class autotrain.trainers.text_classification.params.TextClassificationParams

< >

( data_path: str = None model: str = 'bert-base-uncased' lr: float = 5e-05 epochs: int = 3 max_seq_length: int = 128 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: Optional = None text_column: str = 'text' target_column: str = 'target' logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: Optional = None save_total_limit: int = 1 token: Optional = None push_to_hub: bool = False eval_strategy: str = 'epoch' username: Optional = None log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )

Parameters

  • data_path (str) — Path to the dataset.
  • model (str) — Name of the model to use. Default is “bert-base-uncased”.
  • lr (float) — Learning rate. Default is 5e-5.
  • epochs (int) — Number of training epochs. Default is 3.
  • max_seq_length (int) — Maximum sequence length. Default is 128.
  • batch_size (int) — Training batch size. Default is 8.
  • warmup_ratio (float) — Warmup proportion. Default is 0.1.
  • gradient_accumulation (int) — Number of gradient accumulation steps. Default is 1.
  • optimizer (str) — Optimizer to use. Default is “adamw_torch”.
  • scheduler (str) — Scheduler to use. Default is “linear”.
  • weight_decay (float) — Weight decay. Default is 0.0.
  • max_grad_norm (float) — Maximum gradient norm. Default is 1.0.
  • seed (int) — Random seed. Default is 42.
  • train_split (str) — Name of the training split. Default is “train”.
  • valid_split (Optional[str]) — Name of the validation split. Default is None.
  • text_column (str) — Name of the text column in the dataset. Default is “text”.
  • target_column (str) — Name of the target column in the dataset. Default is “target”.
  • logging_steps (int) — Number of steps between logging. Default is -1.
  • project_name (str) — Name of the project. Default is “project-name”.
  • auto_find_batch_size (bool) — Whether to automatically find the batch size. Default is False.
  • mixed_precision (Optional[str]) — Mixed precision setting (fp16, bf16, or None). Default is None.
  • save_total_limit (int) — Total number of checkpoints to save. Default is 1.
  • token (Optional[str]) — Hub token for authentication. Default is None.
  • push_to_hub (bool) — Whether to push the model to the hub. Default is False.
  • eval_strategy (str) — Evaluation strategy. Default is “epoch”.
  • username (Optional[str]) — Hugging Face username. Default is None.
  • log (str) — Logging method for experiment tracking. Default is “none”.
  • early_stopping_patience (int) — Number of epochs with no improvement after which training will be stopped. Default is 5.
  • early_stopping_threshold (float) — Threshold for measuring the new optimum to continue training. Default is 0.01.

TextClassificationParams is a configuration class for text classification training parameters.

< > Update on GitHub