metadata
tags:
- ml-intern
风控序列模型调研报告 & 代码模板
📋 文件清单
| 文件 | 内容 | 行数 |
|---|---|---|
app_sequence_model.py |
App安装序列建模:CoLES+GRU预训练→微调→LightGBM→图增强 | ~870行 |
credit_bureau_model.py |
征信数据建模:TabM+PLE+FT-Transformer+LightGBM+阈值校准+PSI监控 | ~950行 |
fusion_model.py |
Late Fusion:两模型输出融合为最终决策 | ~150行 |
research_report.md |
完整论文调研报告(方法对比+超参数+论文链接) | 详细 |
🚀 快速开始
pip install torch pytorch-lifestream scikit-learn lightgbm pandas numpy scipy
# 可选: pip install rtdl_num_embeddings rtdl_revisiting_models pytorch-tabular node2vec networkx
- 修改
CONFIG中的特征字段名 - 替换数据加载部分
- 运行
📑 核心论文
App 序列建模
| 方法 | 论文 | 链接 |
|---|---|---|
| CoLES + GRU ⭐ | Contrastive Learning for Event Sequences (KDD 2022) | https://arxiv.org/abs/2002.08232 |
| Graph-Augmented CoLES | Beyond Isolated Clients (2026) | https://arxiv.org/abs/2604.09085 |
| LBSF 层级折叠 | Long-term Behavior Sequence Folding (IEEE 2024) | https://arxiv.org/abs/2411.15056 |
| TabBERT | Tabular Transformers (IBM 2021) | https://arxiv.org/abs/2011.01843 |
| BehaveGPT | Foundation Model for User Behavior (2025) | https://arxiv.org/abs/2505.17631 |
| TransactionGPT | Visa 2025 | https://arxiv.org/abs/2511.08939 |
征信数据建模
| 方法 | 论文 | 链接 |
|---|---|---|
| LightGBM/XGBoost ⭐ | Why tree-based models still outperform DL (NeurIPS 2022) | https://arxiv.org/abs/2207.08815 |
| TabM + PLE ⭐ | Advancing Tabular DL (ICLR 2025) | https://arxiv.org/abs/2410.24210 |
| FT-Transformer | Revisiting DL for Tabular Data (NeurIPS 2021) | https://arxiv.org/abs/2106.11959 |
| PLE数值编码 | On Embeddings for Numerical Features (2022) | https://arxiv.org/abs/2203.05556 |
| SAINT | Improved NN for Tabular Data (2021) | https://arxiv.org/abs/2106.01342 |
🔑 核心结论
- App序列:用 GRU + CoLES 对比学习(无标签预训练→LightGBM),不要默认 Transformer
- 征信数据:先 LightGBM baseline,再 TabM+PLE 补充,0.5:0.5 集成
- 两个模型分开建,最后 Late Fusion(向量拼接→LightGBM stacking)
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "yonghao/risk-control-sequence-models"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.