LightGBM Time Series Forecasting Pipeline
This repository contains a complete reusable forecasting pipeline based on LightGBM models.
The pipeline includes:
- 9 trained LightGBM models
- feature engineering pipeline
- blending configuration
- training statistics
- reusable inference workflow
The models were serialized into a single .pkl file for easy deployment and reuse.
Repository Contents
| File | Description |
|---|---|
full_pipeline.pkl |
Serialized pipeline containing all trained models |
inference_example.py |
Example script for inference |
README.md |
Documentation |
requirements.txt |
Required dependencies |
Included Pipeline Objects
The pickle file contains:
pipeline.keys()
# horizon_models
# subcat_models
# train_stats
# blend_scores
# params
# blend_power
Installation
pip install lightgbm joblib pandas numpy huggingface_hub
Download Model
from huggingface_hub import hf_hub_download
import joblib
REPO_ID = "andrewmos/lightbm-ts-forecasting-kaggle"
model_path = hf_hub_download(
repo_id=REPO_ID,
filename="full_pipeline.pkl",
repo_type="model"
)
pipeline = joblib.load(model_path)
print("Pipeline loaded successfully")
Load Models
horizon_models = pipeline["horizon_models"]
subcat_models = pipeline["subcat_models"]
train_stats = pipeline["train_stats"]
blend_scores = pipeline["blend_scores"]
params = pipeline["params"]
blend_power = pipeline["blend_power"]
Important Note About Feature Engineering
The .pkl file stores the trained models, but it does NOT automatically store the preprocessing logic.
You must recreate the same feature engineering pipeline used during training before running inference.
Example:
def create_features(df):
df = df.copy()
df["month"] = pd.to_datetime(df["date"]).dt.month
df["year"] = pd.to_datetime(df["date"]).dt.year
return df
Example Inference
model_info = horizon_models[1]
model = model_info["model"]
features = model_info["features"]
X_test = test_df[features]
predictions = model.predict(X_test)
print(predictions)
Reproducibility
Recommended package versions:
lightgbm>=4.0
numpy>=1.24
pandas>=2.0
joblib>=1.3
Using compatible versions helps avoid serialization issues.
Use Cases
This repository can be used for:
- time series forecasting
- reusable inference pipelines
- Kaggle competitions
- LightGBM deployment examples
- tabular ML workflows
Author
Andrés Mosquera
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support