Update ML Intern artifact metadata

c27168a verified 4 days ago

3.04 kB

tags:
  - ml-intern

风控序列模型调研报告 & 代码模板

📋 文件清单

文件	内容	行数
`app_sequence_model.py`	App安装序列建模：CoLES+GRU预训练→微调→LightGBM→图增强	~870行
`credit_bureau_model.py`	征信数据建模：TabM+PLE+FT-Transformer+LightGBM+阈值校准+PSI监控	~950行
`fusion_model.py`	Late Fusion：两模型输出融合为最终决策	~150行
`research_report.md`	完整论文调研报告（方法对比+超参数+论文链接）	详细

🚀 快速开始

pip install torch pytorch-lifestream scikit-learn lightgbm pandas numpy scipy
# 可选: pip install rtdl_num_embeddings rtdl_revisiting_models pytorch-tabular node2vec networkx

修改 CONFIG 中的特征字段名
替换数据加载部分
运行

📑 核心论文

App 序列建模

方法	论文	链接
CoLES + GRU ⭐	Contrastive Learning for Event Sequences (KDD 2022)	https://arxiv.org/abs/2002.08232
Graph-Augmented CoLES	Beyond Isolated Clients (2026)	https://arxiv.org/abs/2604.09085
LBSF 层级折叠	Long-term Behavior Sequence Folding (IEEE 2024)	https://arxiv.org/abs/2411.15056
TabBERT	Tabular Transformers (IBM 2021)	https://arxiv.org/abs/2011.01843
BehaveGPT	Foundation Model for User Behavior (2025)	https://arxiv.org/abs/2505.17631
TransactionGPT	Visa 2025	https://arxiv.org/abs/2511.08939

征信数据建模

方法	论文	链接
LightGBM/XGBoost ⭐	Why tree-based models still outperform DL (NeurIPS 2022)	https://arxiv.org/abs/2207.08815
TabM + PLE ⭐	Advancing Tabular DL (ICLR 2025)	https://arxiv.org/abs/2410.24210
FT-Transformer	Revisiting DL for Tabular Data (NeurIPS 2021)	https://arxiv.org/abs/2106.11959
PLE数值编码	On Embeddings for Numerical Features (2022)	https://arxiv.org/abs/2203.05556
SAINT	Improved NN for Tabular Data (2021)	https://arxiv.org/abs/2106.01342

🔑 核心结论

App序列：用 GRU + CoLES 对比学习（无标签预训练→LightGBM），不要默认 Transformer
征信数据：先 LightGBM baseline，再 TabM+PLE 补充，0.5:0.5 集成
两个模型分开建，最后 Late Fusion（向量拼接→LightGBM stacking）

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "yonghao/risk-control-sequence-models"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.