SmolFactory / docs /Entry_Point.md
Tonic's picture
adds readme, removes quantization, adds readtoken logic, updates trackio , spaces
3c37508

A newer version of the Gradio SDK is available: 5.44.1

Upgrade
graph LR
    Entry_Point["Entry Point"]
    Configuration["Configuration"]
    Model_Abstraction["Model Abstraction"]
    Data_Pipeline["Data Pipeline"]
    Training_Logic["Training Logic"]
    Utilities["Utilities"]
    Scripts["Scripts"]
    Requirements_Management["Requirements Management"]
    Entry_Point -- "initializes" --> Configuration
    Entry_Point -- "initializes" --> Model_Abstraction
    Entry_Point -- "initializes" --> Data_Pipeline
    Entry_Point -- "invokes" --> Training_Logic
    Configuration -- "provides settings to" --> Model_Abstraction
    Configuration -- "provides settings to" --> Data_Pipeline
    Configuration -- "provides settings to" --> Training_Logic
    Model_Abstraction -- "provides model to" --> Training_Logic
    Data_Pipeline -- "provides data to" --> Training_Logic
    Training_Logic -- "utilizes" --> Model_Abstraction
    Training_Logic -- "utilizes" --> Data_Pipeline
    Training_Logic -- "utilizes" --> Configuration
    Training_Logic -- "utilizes" --> Utilities
    Data_Pipeline -- "uses" --> Utilities
    Model_Abstraction -- "uses" --> Utilities
    Scripts -- "supports" --> Data_Pipeline
    Scripts -- "supports" --> Model_Abstraction
    Requirements_Management -- "defines environment for" --> Entry_Point
    Requirements_Management -- "defines environment for" --> Configuration
    Requirements_Management -- "defines environment for" --> Model_Abstraction
    Requirements_Management -- "defines environment for" --> Data_Pipeline
    Requirements_Management -- "defines environment for" --> Training_Logic
    Requirements_Management -- "defines environment for" --> Utilities
    Requirements_Management -- "defines environment for" --> Scripts
    click Entry_Point href "https://github.com/Josephrp/SmolFactory/blob/main/docs/Entry_Point.md" "Details"
    click Model_Abstraction href "https://github.com/Josephrp/SmolFactory/blob/main/docs/Model_Abstraction.md" "Details"
    click Data_Pipeline href "https://github.com/Josephrp/SmolFactory/blob/main/docs/Data_Pipeline.md" "Details"

CodeBoardingDemoContact

Details

Component overview for the Machine Learning Training and Fine-tuning Framework.

Entry Point [Expand]

The primary execution script that orchestrates the entire training process. It initializes all other major components, loads configurations, sets up the training environment, and invokes the core training logic.

Related Classes/Methods:

  • train.py

Configuration

Centralized management of all training parameters, model hyperparameters, dataset paths, and other environment settings. It defines the schema for configurations, often using dataclasses, and supports both base and custom configurations.

Related Classes/Methods:

  • config/ (1:1)

Model Abstraction [Expand]

Responsible for abstracting the underlying machine learning model. This includes loading pre-trained models, handling different model architectures or variants, and preparing the model for training (e.g., quantization, device placement).

Related Classes/Methods:

Data Pipeline [Expand]

Manages the entire data flow, from loading raw datasets to preprocessing, tokenization, and creating efficient data loaders (e.g., PyTorch DataLoader) for batching and shuffling data during training and evaluation.

Related Classes/Methods:

Training Logic

Encapsulates the core training loop, including forward and backward passes, loss calculation, optimization steps, and integration of callbacks for monitoring and control. It may include specialized trainers for different fine-tuning methods.

Related Classes/Methods:

Utilities

Provides a collection of common helper functions, classes, and modules used across various components. This includes functionalities like logging, metric calculation, checkpointing, and general data manipulation.

Related Classes/Methods:

  • utils/ (1:1)

Scripts

Contains auxiliary scripts that support the overall project but are separate from the main training pipeline. Examples include data preparation scripts, model conversion tools, or deployment-related utilities.

Related Classes/Methods:

  • scripts/ (1:1)

Requirements Management

Defines and manages all project dependencies, ensuring a consistent and reproducible development and deployment environment. This typically involves requirements.txt files or similar dependency management tools.

Related Classes/Methods:

  • requirements/ (1:1)

FAQ