AutoTrain documentation

Tabular Parameters

You are viewing v0.8.17 version. A newer version v0.8.24 is available.
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Tabular Parameters

--batch-size BATCH_SIZE
                    Training batch size to use
--seed SEED           Random seed for reproducibility
--target-columns TARGET_COLUMNS
                    Specify the names of the target or label columns separated by commas if multiple. These columns are what the model will
                    predict. Required for defining the output of the model.
--categorical-columns CATEGORICAL_COLUMNS
                    List the names of columns that contain categorical data, useful for models that need explicit handling of such data.
                    Categorical data is typically processed differently from numerical data, such as through encoding. If not specified, the
                    model will infer the data type.
--numerical-columns NUMERICAL_COLUMNS
                    Identify columns that contain numerical data. Proper specification helps in applying appropriate scaling and normalization
                    techniques, which can significantly impact model performance. If not specified, the model will infer the data type.
--id-column ID_COLUMN
                    Specify the column name that uniquely identifies each row in the dataset. This is critical for tracking samples through the
                    model pipeline and is often excluded from model training. Required field.
--task {classification,regression}
                    Define the type of machine learning task, such as 'classification', 'regression'. This parameter determines the model's
                    architecture and the loss function to use. Required to properly configure the model.
--num-trials NUM_TRIALS
                    Set the number of trials for hyperparameter tuning or model experimentation. More trials can lead to better model
                    configurations but require more computational resources. Default is 100 trials.
--time-limit TIME_LIMIT
                    mpose a time limit (in seconds) for training or searching for the best model configuration. This helps manage resource
                    allocation and ensures the process does not exceed available computational budgets. The default is 3600 seconds (1 hour).
--categorical-imputer {most_frequent,None}
                    Select the method or strategy to impute missing values in categorical columns. Options might include 'most_frequent',
                    'None'. Correct imputation can prevent biases and improve model accuracy.
--numerical-imputer {mean,median,None}
                    Choose the imputation strategy for missing values in numerical columns. Common strategies include 'mean', & 'median'.
                    Accurate imputation is vital for maintaining the integrity of numerical data.
--numeric-scaler {standard,minmax,normal,robust}
                    Determine the type of scaling to apply to numerical data. Examples include 'standard' (zero mean and unit variance), 'min-
                    max' (scaled between given range), etc. Scaling is essential for many algorithms to perform optimally
< > Update on GitHub