AutoTrain documentation

Token Classification Parameters

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.8.21).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Token Classification Parameters

class autotrain.trainers.token_classification.params.TokenClassificationParams

< >

( data_path: str = None model: str = 'bert-base-uncased' lr: float = 5e-05 epochs: int = 3 max_seq_length: int = 128 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: Optional = None tokens_column: str = 'tokens' tags_column: str = 'tags' logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: Optional = None save_total_limit: int = 1 token: Optional = None push_to_hub: bool = False eval_strategy: str = 'epoch' username: Optional = None log: str = 'none' early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )

Parameters

  • data_path (str) — Path to the dataset.
  • model (str) — Name of the model to use. Default is “bert-base-uncased”.
  • lr (float) — Learning rate. Default is 5e-5.
  • epochs (int) — Number of training epochs. Default is 3.
  • max_seq_length (int) — Maximum sequence length. Default is 128.
  • batch_size (int) — Training batch size. Default is 8.
  • warmup_ratio (float) — Warmup proportion. Default is 0.1.
  • gradient_accumulation (int) — Gradient accumulation steps. Default is 1.
  • optimizer (str) — Optimizer to use. Default is “adamw_torch”.
  • scheduler (str) — Scheduler to use. Default is “linear”.
  • weight_decay (float) — Weight decay. Default is 0.0.
  • max_grad_norm (float) — Maximum gradient norm. Default is 1.0.
  • seed (int) — Random seed. Default is 42.
  • train_split (str) — Name of the training split. Default is “train”.
  • valid_split (Optional[str]) — Name of the validation split. Default is None.
  • tokens_column (str) — Name of the tokens column. Default is “tokens”.
  • tags_column (str) — Name of the tags column. Default is “tags”.
  • logging_steps (int) — Number of steps between logging. Default is -1.
  • project_name (str) — Name of the project. Default is “project-name”.
  • auto_find_batch_size (bool) — Whether to automatically find the batch size. Default is False.
  • mixed_precision (Optional[str]) — Mixed precision setting (fp16, bf16, or None). Default is None.
  • save_total_limit (int) — Total number of checkpoints to save. Default is 1.
  • token (Optional[str]) — Hub token for authentication. Default is None.
  • push_to_hub (bool) — Whether to push the model to the Hugging Face hub. Default is False.
  • eval_strategy (str) — Evaluation strategy. Default is “epoch”.
  • username (Optional[str]) — Hugging Face username. Default is None.
  • log (str) — Logging method for experiment tracking. Default is “none”.
  • early_stopping_patience (int) — Patience for early stopping. Default is 5.
  • early_stopping_threshold (float) — Threshold for early stopping. Default is 0.01.

TokenClassificationParams is a configuration class for token classification training parameters.

< > Update on GitHub