YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Cross-store matching CatBoost classifier

Binary classifier for product variant matching (train/validation/holdout from HF dataset).

Thresholds

Holdout: "20260214_consideration_50k"

Recall at Precision=0.90: 0.319810 (threshold=0.7523473006041936)

Recall at Precision=0.95: 0.200178 (threshold=0.891946742060409)

Recall at Precision=0.96: 0.170566 (threshold=0.9176396207992682)

Recall at Precision=0.97: 0.143915 (threshold=0.935318100138531)

Recall at Precision=0.98: 0.087060 (threshold=0.9687166096895236)

Recall at Precision=0.99: 0.058632 (threshold=0.9820673917303737)

Config

  • HF_DATASET_ID: olegakimovmle/cross-store-matching-variant-catboost-features
  • VERSION (dataset revision): v8_0_0_25022026
  • MODEL_VERSION: v8_0_0_25022026
  • EXPERIMENT_NAME: consider_300k_all_feats

TRAIN_VALID_SAMPLE_SOURCES

[
  "20260206_consideration_100k",
  "20260212_consideration_50k",
  "20260215_consideration_50k",
  "20260217_consideration_50k",
  "20260219_consideration_50k"
]

HOLDOUT_SAMPLE_SOURCES

[
  "20260209_consideration_10k",
  "20260203_search_10k",
  "20260214_consideration_50k"
]

FEATURE_COLS (43)

[
  "old_phash_hamming_distance",
  "old_unique_terms_count",
  "old_common_terms_count",
  "old_bm25_distance",
  "old_product_vendor_bm25distance",
  "old_same_shop",
  "old_are_categories_equal",
  "old_max_common_category_level",
  "old_min_category_precision",
  "old_options_iou",
  "old_cosine_similarity",
  "old_has_different_gender",
  "old_avg_price_difference",
  "new_title_common_prefix_words",
  "new_title_common_prefix_words_pct",
  "new_title_common_suffix_words",
  "new_title_common_suffix_words_pct",
  "new_title_common_set_words",
  "new_title_common_set_words_pct",
  "new_title_common_prefix_letters",
  "new_title_common_prefix_letters_pct",
  "new_title_common_suffix_letters",
  "new_title_common_suffix_letters_pct",
  "new_url_common_prefix_words",
  "new_url_common_prefix_words_pct",
  "new_url_common_prefix_letters",
  "new_url_common_prefix_letters_pct",
  "new_desc_len_ratio",
  "new_desc_len_diff",
  "new_desc_common_word_count",
  "new_desc_overlap_ratio_min",
  "new_desc_overlap_ratio_max",
  "new_desc_word_jaccard",
  "new_desc_left_word_count",
  "new_desc_right_word_count",
  "new_desc_overlap_ratio_left",
  "new_desc_overlap_ratio_right",
  "new_same_phash",
  "new_same_product_type",
  "new_same_handle",
  "new_product_age_days_diff",
  "new_avg_price_ratio",
  "new_same_predicted_category"
]

CATBOOST_PARAMS

{
  "iterations": 3000,
  "learning_rate": 0.05,
  "depth": 10,
  "loss_function": "Logloss",
  "eval_metric": "PRAUC",
  "random_seed": 42,
  "verbose": 100,
  "early_stopping_rounds": 100,
  "min_data_in_leaf": 50
}

Feature importance (full)

feature importance
old_cosine_similarity 17.846883
old_bm25_distance 6.120002
old_product_vendor_bm25distance 5.968252
new_product_age_days_diff 5.397785
old_unique_terms_count 4.823864
old_options_iou 3.758595
old_phash_hamming_distance 3.608113
new_desc_overlap_ratio_min 3.535647
old_avg_price_difference 3.333098
new_desc_len_diff 3.206305
new_avg_price_ratio 3.181133
new_desc_right_word_count 2.997256
new_desc_common_word_count 2.893128
new_desc_len_ratio 2.713265
new_title_common_set_words_pct 2.698235
new_desc_left_word_count 2.620198
old_min_category_precision 2.277187
new_desc_word_jaccard 2.133502
new_title_common_prefix_letters_pct 1.708667
old_same_shop 1.632605
new_title_common_set_words 1.569174
new_desc_overlap_ratio_max 1.526644
old_max_common_category_level 1.518747
old_common_terms_count 1.445524
new_desc_overlap_ratio_right 1.401540
new_desc_overlap_ratio_left 1.246630
new_title_common_prefix_words_pct 1.226824
new_title_common_prefix_letters 1.174842
new_url_common_prefix_letters_pct 1.040912
new_title_common_suffix_letters_pct 0.833897
new_url_common_prefix_letters 0.827906
new_url_common_prefix_words_pct 0.669128
new_same_product_type 0.658220
new_title_common_suffix_words_pct 0.584922
new_title_common_suffix_letters 0.515657
new_title_common_prefix_words 0.346420
new_title_common_suffix_words 0.222175
old_are_categories_equal 0.215149
new_url_common_prefix_words 0.214980
new_same_predicted_category 0.160420
new_same_phash 0.081744
old_has_different_gender 0.045671
new_same_handle 0.019155

holdout_results["20260214_consideration_50k"]["precision_thrs"]

   precision_threshold  precision_actual    recall  proba_threshold  above_threshold  pct_above_threshold
0                 0.90          0.900000  0.319810         0.752347           1200.0             2.726839
1                 0.95          0.950774  0.200178         0.891947            711.0             1.615652
2                 0.96          0.960000  0.170566         0.917640            600.0             1.363419
3                 0.97          0.970060  0.143915         0.935318            501.0             1.138455
4                 0.98          0.980000  0.087060         0.968717            300.0             0.681710
5                 0.99          0.990000  0.058632         0.982067            200.0             0.454473

holdout_results["20260214_consideration_50k"]["recall_thrs"]

   recall_threshold  recall_actual  precision  proba_threshold  above_threshold  pct_above_threshold
0              0.90       0.900800   0.298323         0.045783          10197.0            23.171314
1              0.95       0.950252   0.213976         0.020878          14997.0            34.078669
2              0.96       0.960024   0.193645         0.016674          16742.0            38.043948
3              0.97       0.970388   0.172574         0.012527          18989.0            43.149953
4              0.98       0.980160   0.152093         0.009200          21763.0            49.453496
5              0.99       0.990228   0.130223         0.006002          25679.0            58.352080
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support