LightGBM Time Series Forecasting Pipeline

This repository contains a complete reusable forecasting pipeline based on LightGBM models.

The pipeline includes:

  • 9 trained LightGBM models
  • feature engineering pipeline
  • blending configuration
  • training statistics
  • reusable inference workflow

The models were serialized into a single .pkl file for easy deployment and reuse.


Repository Contents

File Description
full_pipeline.pkl Serialized pipeline containing all trained models
inference_example.py Example script for inference
README.md Documentation
requirements.txt Required dependencies

Included Pipeline Objects

The pickle file contains:

pipeline.keys()

# horizon_models
# subcat_models
# train_stats
# blend_scores
# params
# blend_power

Installation

pip install lightgbm joblib pandas numpy huggingface_hub

Download Model

from huggingface_hub import hf_hub_download
import joblib

REPO_ID = "andrewmos/lightbm-ts-forecasting-kaggle"

model_path = hf_hub_download(
    repo_id=REPO_ID,
    filename="full_pipeline.pkl",
    repo_type="model"
)

pipeline = joblib.load(model_path)

print("Pipeline loaded successfully")

Load Models

horizon_models = pipeline["horizon_models"]
subcat_models = pipeline["subcat_models"]

train_stats = pipeline["train_stats"]

blend_scores = pipeline["blend_scores"]

params = pipeline["params"]

blend_power = pipeline["blend_power"]

Important Note About Feature Engineering

The .pkl file stores the trained models, but it does NOT automatically store the preprocessing logic.

You must recreate the same feature engineering pipeline used during training before running inference.

Example:

def create_features(df):

    df = df.copy()

    df["month"] = pd.to_datetime(df["date"]).dt.month
    df["year"] = pd.to_datetime(df["date"]).dt.year

    return df

Example Inference

model_info = horizon_models[1]

model = model_info["model"]

features = model_info["features"]

X_test = test_df[features]

predictions = model.predict(X_test)

print(predictions)

Reproducibility

Recommended package versions:

lightgbm>=4.0
numpy>=1.24
pandas>=2.0
joblib>=1.3

Using compatible versions helps avoid serialization issues.


Use Cases

This repository can be used for:

  • time series forecasting
  • reusable inference pipelines
  • Kaggle competitions
  • LightGBM deployment examples
  • tabular ML workflows

Author

Andrés Mosquera

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support