player-value-lgbm

LightGBM model that forecasts the 6-month log-change in Transfermarkt market value for football players. Trained on the engineered features in player-value-features.

Performance (held-out test, ≥ 2024)

Model Test MAE (log) Test R²
Naive (y=0) 0.2145 −0.006
Ridge regression 0.2164 +0.130
AutoETS 0.2361 −0.198
AutoTheta 0.2740 −0.298
LightGBM 0.1970 +0.205

LightGBM wins by ~40% in R² over the best baseline and is the only non-trivial model to clearly beat the naive predictor.

Architecture

  • LightGBM regression with L1 objective (MAE)
  • 351 trees (early-stopped on validation)
  • num_leaves=127, learning_rate=0.05, feature_fraction=0.8, bagging_fraction=0.8, lambda_l1=0.1, lambda_l2=0.1
  • 84 features (92 raw, 8 dropped by the audit), 5 categorical handled natively (position, sub_position, foot, career_stage, country_of_citizenship)

Top features by gain

  1. country_of_citizenship — the market prices passport over performance
  2. as_of_year — captures market inflation
  3. value_diff_1 — recent momentum
  4. log_value — base level
  5. age
  6. as_of_month — transfer-window seasonality
  7. value_diff_3
  8. international_caps
  9. age_minus_position_peak — distance from position-specific peak
  10. value_lag_1

How to use

import lightgbm as lgb
import pandas as pd
from huggingface_hub import hf_hub_download

model_path = hf_hub_download("DanielRegaladoCardoso/player-value-lgbm",
                              "model.lgb")
booster = lgb.Booster(model_file=model_path)

# X must be a DataFrame whose categorical columns are pd.Categorical
# (see scripts/train_lgbm.py in the repo for the exact preprocessing)
preds = booster.predict(X)        # predicts log(value_T+6mo / value_T)

To convert a prediction back into euros:

predicted_value_eur = current_value_eur * np.exp(predicted_log_ratio)

Limitations

  • The 6-month horizon is a median — Transfermarkt updates 2–3×/year irregularly. The model is trained on the irregular cadence; predictions are not calibrated for a fixed 180-day horizon.
  • R² of +0.20 means ~80% of valuation-change variance is NOT explained by these features — markets are noisy.
  • country_of_citizenship being the strongest feature reflects market bias the model captures (and does not endorse).
  • Trained only on Transfermarkt's tracked competitions; lower-tier leagues are under-represented.

License

Apache 2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support