DevikaJ2005 commited on
Commit
e4ffe61
·
1 Parent(s): ea9eade

Refine FraudShield snapshot pipeline

Browse files
README.md CHANGED
@@ -1,19 +1,40 @@
1
  # FraudShield
2
 
3
- FraudShield is an OpenEnv environment for marketplace fraud review. An agent receives one e-commerce transaction at a time, decides whether it is `fraud` or `legitimate`, and gets dense reward shaped by business impact, confidence calibration, and correctness.
4
 
5
- The environment is built from a compact task bundle derived from the Kaggle credit card fraud dataset. The bundle is committed as `data/fraudshield_cases.json`, so the repo stays self-contained for Docker and Hugging Face Spaces while still grounding the tasks in real fraud data.
6
 
7
- ## Why this environment
8
 
9
- Real commerce teams review risky orders all day: new sellers, chargeback-heavy merchants, reused devices, flash-sale spikes, and account-takeover style behavior. FraudShield turns that workflow into an agent training environment with:
10
 
11
- - A real-world domain instead of a toy game
12
- - Typed `Action`, `Observation`, and `Reward` models
13
- - `reset()`, `step()`, and `state()` APIs
14
- - Three graded tasks with deterministic scoring from `0.0` to `1.0`
15
- - Dense step rewards with partial progress signals
16
- - A root `inference.py` baseline compatible with the required OpenAI client flow
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Tasks
19
 
@@ -23,11 +44,9 @@ Real commerce teams review risky orders all day: new sellers, chargeback-heavy m
23
  | Medium | 36 | Balance fraud capture with calibration | No single signal is decisive; tradeoffs matter |
24
  | Hard | 48 | Handle coordinated abuse and edge-case legitimate traffic | Fraud rings and flash-sale behavior intentionally overlap |
25
 
26
- Each task uses a deterministic grader in [graders.py](/c:/Users/Jayashanker/Downloads/fraudshield_kaggle_ready/fraudshield_kaggle/graders.py).
27
-
28
  ## Action space
29
 
30
- Agents emit a single [FraudCheckAction](/c:/Users/Jayashanker/Downloads/fraudshield_kaggle_ready/fraudshield_kaggle/models.py):
31
 
32
  ```python
33
  FraudCheckAction(
@@ -40,15 +59,15 @@ FraudCheckAction(
40
 
41
  ## Observation space
42
 
43
- Each step returns a [FraudCheckObservation](/c:/Users/Jayashanker/Downloads/fraudshield_kaggle_ready/fraudshield_kaggle/models.py) with structured transaction facts and rolling context:
44
 
45
- - Transaction facts: amount, seller age, buyer age, payment method, geo mismatch, rating, prior flags, chargeback rate, shared-device counts, same-address velocity, and more
46
- - Historical context: seller velocity, linked cards, refund counts, cluster alert score, and task-specific notes
47
- - Task metadata: difficulty and episode step
48
 
49
  ## Reward design
50
 
51
- Rewards are dense and cost-sensitive in [fraudshield_env.py](/c:/Users/Jayashanker/Downloads/fraudshield_kaggle_ready/fraudshield_kaggle/fraudshield_env.py):
52
 
53
  - Correct fraud catches receive the strongest positive reward
54
  - Correct legitimate approvals still earn positive reward, but less than catching fraud
@@ -56,11 +75,9 @@ Rewards are dense and cost-sensitive in [fraudshield_env.py](/c:/Users/Jayashank
56
  - Confidence is rewarded when it matches hidden case difficulty and punished when it is overconfident
57
  - Submitting the wrong `transaction_id` adds an extra penalty
58
 
59
- This gives the agent signal across the full trajectory instead of only at episode end.
60
-
61
  ## Graders
62
 
63
- The three task graders are deterministic and return `0.0` to `1.0`.
64
 
65
  - Easy: accuracy, F1, recall, and specificity
66
  - Medium: F1, ROC-AUC, precision, and confidence calibration
@@ -68,10 +85,10 @@ The three task graders are deterministic and return `0.0` to `1.0`.
68
 
69
  ## Baseline inference
70
 
71
- The required root script is [inference.py](/c:/Users/Jayashanker/Downloads/fraudshield_kaggle_ready/fraudshield_kaggle/inference.py).
72
 
73
  - Competition mode: if `API_BASE_URL`, `MODEL_NAME`, and `HF_TOKEN` are set, it uses the OpenAI client against that endpoint
74
- - Local smoke-test mode: if those variables are missing, it falls back to a deterministic heuristic agent so the repo can still be verified offline
75
 
76
  Required environment variables for the competition path:
77
 
@@ -87,11 +104,11 @@ Run it with:
87
  python inference.py
88
  ```
89
 
90
- The script writes `fraudshield_baseline_results.json` in the project root.
91
 
92
- ### Local offline baseline
93
 
94
- With the deterministic heuristic fallback and seed `42`, the current local smoke-test scores are:
95
 
96
  | Task | Score |
97
  | --- | ---: |
@@ -103,23 +120,23 @@ With the deterministic heuristic fallback and seed `42`, the current local smoke
103
  ## Project layout
104
 
105
  ```text
106
- fraudshield_kaggle/
107
- ├── data/
108
- └── fraudshield_cases.json
109
- ├── server/
110
- ├── __init__.py
111
- └── app.py
112
- ├── data_loader.py
113
- ├── download_kaggle_data.py
114
- ├── Dockerfile
115
- ├── fraudshield_env.py
116
- ├── graders.py
117
- ├── inference.py
118
- ├── inference_llm.py
119
- ├── llm_agent.py
120
- ├── models.py
121
- ├── openenv.yaml
122
- └── pyproject.toml
123
  ```
124
 
125
  ## Setup
@@ -130,14 +147,16 @@ Install the project:
130
  python -m pip install -e .
131
  ```
132
 
133
- Optional: if you want to regenerate the compact bundle from the original Kaggle CSV instead of using the committed task file:
134
 
135
  ```bash
136
  python -m pip install -e ".[data]"
137
  python download_kaggle_data.py
138
  ```
139
 
140
- ## Running the environment locally
 
 
141
 
142
  ### Python API
143
 
@@ -146,7 +165,7 @@ from fraudshield_env import FraudShieldEnvironment
146
  from models import DecisionEnum, FraudCheckAction
147
 
148
  env = FraudShieldEnvironment(data_path="data", seed=42)
149
- env.load_kaggle_data()
150
  reset_result = env.reset("medium")
151
 
152
  action = FraudCheckAction(
@@ -184,16 +203,7 @@ docker build -t fraudshield .
184
  docker run -p 7860:7860 fraudshield
185
  ```
186
 
187
- The container listens on port `7860`, which is the expected default for Hugging Face Docker Spaces.
188
-
189
- ## Hugging Face Spaces
190
-
191
- This repo is ready for a Docker Space:
192
-
193
- - Include `openenv` in the Space tags
194
- - Use the provided `Dockerfile`
195
- - Expose the app on port `7860`
196
- - Set `API_BASE_URL`, `MODEL_NAME`, and `HF_TOKEN` in the Space secrets if you want the LLM baseline to run there
197
 
198
  ## Validation checklist
199
 
@@ -213,6 +223,6 @@ Then verify:
213
 
214
  ## Notes
215
 
216
- - The committed task bundle is small on purpose so the repo stays deployable without external downloads
217
- - The source CSV is optional and only needed if you want to regenerate the bundle
218
- - `inference_llm.py` is kept as a backward-compatible wrapper to the main baseline entrypoint
 
1
  # FraudShield
2
 
3
+ FraudShield is an OpenEnv environment for marketplace fraud review. An agent inspects one transaction at a time, predicts whether it is `fraud` or `legitimate`, and receives dense reward shaped by business impact, confidence calibration, and correctness.
4
 
5
+ The environment is grounded in real public fraud data, but it does not fetch live records during `reset()` or `step()`. Instead, it uses a frozen, versioned snapshot stored in `data/fraudshield_cases.json`. That gives you real-world grounding with deterministic grading, fast Docker startup, and reproducible evaluation on Hugging Face Spaces.
6
 
7
+ ## Why this design
8
 
9
+ For an OpenEnv submission, the safest pattern is:
10
 
11
+ - Fetch or refresh public source data offline
12
+ - Build a deterministic FraudShield snapshot
13
+ - Commit the snapshot used for evaluation
14
+ - Keep the environment runtime fully offline
15
+
16
+ That avoids runtime API failures, privacy issues, and non-reproducible scores.
17
+
18
+ ## Real-world data strategy
19
+
20
+ FraudShield currently builds its snapshot from the public Kaggle / ULB credit card fraud dataset:
21
+
22
+ - Source ID: `kaggle_creditcardfraud`
23
+ - Dataset: `mlg-ulb/creditcardfraud`
24
+ - URL: `https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud`
25
+
26
+ The loader is now source-agnostic in code:
27
+
28
+ - `data_loader.py` exposes a public-source snapshot pipeline
29
+ - `download_kaggle_data.py` refreshes the local source CSV and rebuilds the frozen snapshot
30
+ - `fraudshield_env.py` reads the snapshot only at runtime
31
+
32
+ The checked-in snapshot currently reports:
33
+
34
+ - Snapshot ID: `fraudshield-realworld-v2`
35
+ - Schema version: `2.0`
36
+ - Seed: `42`
37
+ - Task sizes: easy `24`, medium `36`, hard `48`
38
 
39
  ## Tasks
40
 
 
44
  | Medium | 36 | Balance fraud capture with calibration | No single signal is decisive; tradeoffs matter |
45
  | Hard | 48 | Handle coordinated abuse and edge-case legitimate traffic | Fraud rings and flash-sale behavior intentionally overlap |
46
 
 
 
47
  ## Action space
48
 
49
+ Agents emit a single `FraudCheckAction`:
50
 
51
  ```python
52
  FraudCheckAction(
 
59
 
60
  ## Observation space
61
 
62
+ Each step returns a `FraudCheckObservation` with:
63
 
64
+ - Structured transaction facts such as amount, seller age, buyer age, geo mismatch, rating, prior flags, chargeback rate, shared-device counts, and address velocity
65
+ - Historical context such as seller velocity, linked cards, refund counts, cluster alert score, and source snapshot metadata
66
+ - Task metadata including difficulty and episode step
67
 
68
  ## Reward design
69
 
70
+ Rewards in `fraudshield_env.py` are dense and cost-sensitive:
71
 
72
  - Correct fraud catches receive the strongest positive reward
73
  - Correct legitimate approvals still earn positive reward, but less than catching fraud
 
75
  - Confidence is rewarded when it matches hidden case difficulty and punished when it is overconfident
76
  - Submitting the wrong `transaction_id` adds an extra penalty
77
 
 
 
78
  ## Graders
79
 
80
+ The three task graders in `graders.py` are deterministic and return scores from `0.0` to `1.0`.
81
 
82
  - Easy: accuracy, F1, recall, and specificity
83
  - Medium: F1, ROC-AUC, precision, and confidence calibration
 
85
 
86
  ## Baseline inference
87
 
88
+ The required root script is `inference.py`.
89
 
90
  - Competition mode: if `API_BASE_URL`, `MODEL_NAME`, and `HF_TOKEN` are set, it uses the OpenAI client against that endpoint
91
+ - Local smoke-test mode: if those variables are missing, it falls back to a deterministic heuristic agent
92
 
93
  Required environment variables for the competition path:
94
 
 
104
  python inference.py
105
  ```
106
 
107
+ The script writes `fraudshield_baseline_results.json` to the project root.
108
 
109
+ ### Tested local baseline
110
 
111
+ I reran the baseline after the snapshot-loader changes. With the deterministic heuristic fallback and seed `42`, the tested local scores are:
112
 
113
  | Task | Score |
114
  | --- | ---: |
 
120
  ## Project layout
121
 
122
  ```text
123
+ fraudshield/
124
+ |-- data/
125
+ | |-- fraudshield_cases.json
126
+ |-- server/
127
+ | |-- __init__.py
128
+ | `-- app.py
129
+ |-- data_loader.py
130
+ |-- download_kaggle_data.py
131
+ |-- Dockerfile
132
+ |-- fraudshield_env.py
133
+ |-- graders.py
134
+ |-- inference.py
135
+ |-- inference_llm.py
136
+ |-- llm_agent.py
137
+ |-- models.py
138
+ |-- openenv.yaml
139
+ `-- pyproject.toml
140
  ```
141
 
142
  ## Setup
 
147
  python -m pip install -e .
148
  ```
149
 
150
+ If you want to rebuild the frozen snapshot from the public source CSV:
151
 
152
  ```bash
153
  python -m pip install -e ".[data]"
154
  python download_kaggle_data.py
155
  ```
156
 
157
+ If `data/creditcard.csv` already exists locally, the script rebuilds the snapshot without needing to download again.
158
+
159
+ ## Running locally
160
 
161
  ### Python API
162
 
 
165
  from models import DecisionEnum, FraudCheckAction
166
 
167
  env = FraudShieldEnvironment(data_path="data", seed=42)
168
+ env.load_data()
169
  reset_result = env.reset("medium")
170
 
171
  action = FraudCheckAction(
 
203
  docker run -p 7860:7860 fraudshield
204
  ```
205
 
206
+ The container listens on port `7860`, which matches Hugging Face Docker Spaces expectations.
 
 
 
 
 
 
 
 
 
207
 
208
  ## Validation checklist
209
 
 
223
 
224
  ## Notes
225
 
226
+ - Runtime uses the committed snapshot only
227
+ - Public source refresh is optional and intended for offline rebuilds
228
+ - `inference_llm.py` remains as a thin wrapper to `inference.py`
data/fraudshield_cases.json CHANGED
@@ -1,6 +1,21 @@
1
  {
2
- "seed": 42,
3
- "source": "Kaggle creditcardfraud",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  "tasks": {
5
  "easy": [
6
  {
@@ -33,6 +48,8 @@
33
  },
34
  "historical_context": {
35
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
36
  "seller_transactions_1h": 4,
37
  "linked_cards_7d": 3,
38
  "recent_refunds_7d": 2,
@@ -72,6 +89,8 @@
72
  },
73
  "historical_context": {
74
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
75
  "seller_transactions_1h": 13,
76
  "linked_cards_7d": 1,
77
  "recent_refunds_7d": 1,
@@ -111,6 +130,8 @@
111
  },
112
  "historical_context": {
113
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
114
  "seller_transactions_1h": 15,
115
  "linked_cards_7d": 3,
116
  "recent_refunds_7d": 1,
@@ -150,6 +171,8 @@
150
  },
151
  "historical_context": {
152
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
153
  "seller_transactions_1h": 4,
154
  "linked_cards_7d": 2,
155
  "recent_refunds_7d": 0,
@@ -189,6 +212,8 @@
189
  },
190
  "historical_context": {
191
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
192
  "seller_transactions_1h": 2,
193
  "linked_cards_7d": 2,
194
  "recent_refunds_7d": 0,
@@ -228,6 +253,8 @@
228
  },
229
  "historical_context": {
230
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
231
  "seller_transactions_1h": 16,
232
  "linked_cards_7d": 3,
233
  "recent_refunds_7d": 1,
@@ -267,6 +294,8 @@
267
  },
268
  "historical_context": {
269
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
270
  "seller_transactions_1h": 15,
271
  "linked_cards_7d": 2,
272
  "recent_refunds_7d": 2,
@@ -306,6 +335,8 @@
306
  },
307
  "historical_context": {
308
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
309
  "seller_transactions_1h": 16,
310
  "linked_cards_7d": 1,
311
  "recent_refunds_7d": 1,
@@ -345,6 +376,8 @@
345
  },
346
  "historical_context": {
347
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
348
  "seller_transactions_1h": 17,
349
  "linked_cards_7d": 4,
350
  "recent_refunds_7d": 1,
@@ -384,6 +417,8 @@
384
  },
385
  "historical_context": {
386
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
387
  "seller_transactions_1h": 5,
388
  "linked_cards_7d": 3,
389
  "recent_refunds_7d": 2,
@@ -423,6 +458,8 @@
423
  },
424
  "historical_context": {
425
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
426
  "seller_transactions_1h": 18,
427
  "linked_cards_7d": 3,
428
  "recent_refunds_7d": 1,
@@ -462,6 +499,8 @@
462
  },
463
  "historical_context": {
464
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
465
  "seller_transactions_1h": 15,
466
  "linked_cards_7d": 3,
467
  "recent_refunds_7d": 1,
@@ -501,6 +540,8 @@
501
  },
502
  "historical_context": {
503
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
504
  "seller_transactions_1h": 8,
505
  "linked_cards_7d": 1,
506
  "recent_refunds_7d": 1,
@@ -540,6 +581,8 @@
540
  },
541
  "historical_context": {
542
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
543
  "seller_transactions_1h": 8,
544
  "linked_cards_7d": 2,
545
  "recent_refunds_7d": 1,
@@ -579,6 +622,8 @@
579
  },
580
  "historical_context": {
581
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
582
  "seller_transactions_1h": 6,
583
  "linked_cards_7d": 1,
584
  "recent_refunds_7d": 2,
@@ -618,6 +663,8 @@
618
  },
619
  "historical_context": {
620
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
621
  "seller_transactions_1h": 9,
622
  "linked_cards_7d": 2,
623
  "recent_refunds_7d": 1,
@@ -657,6 +704,8 @@
657
  },
658
  "historical_context": {
659
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
660
  "seller_transactions_1h": 11,
661
  "linked_cards_7d": 1,
662
  "recent_refunds_7d": 1,
@@ -696,6 +745,8 @@
696
  },
697
  "historical_context": {
698
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
699
  "seller_transactions_1h": 2,
700
  "linked_cards_7d": 1,
701
  "recent_refunds_7d": 1,
@@ -735,6 +786,8 @@
735
  },
736
  "historical_context": {
737
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
738
  "seller_transactions_1h": 10,
739
  "linked_cards_7d": 4,
740
  "recent_refunds_7d": 0,
@@ -774,6 +827,8 @@
774
  },
775
  "historical_context": {
776
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
777
  "seller_transactions_1h": 1,
778
  "linked_cards_7d": 3,
779
  "recent_refunds_7d": 1,
@@ -813,6 +868,8 @@
813
  },
814
  "historical_context": {
815
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
816
  "seller_transactions_1h": 5,
817
  "linked_cards_7d": 3,
818
  "recent_refunds_7d": 2,
@@ -852,6 +909,8 @@
852
  },
853
  "historical_context": {
854
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
855
  "seller_transactions_1h": 17,
856
  "linked_cards_7d": 2,
857
  "recent_refunds_7d": 2,
@@ -891,6 +950,8 @@
891
  },
892
  "historical_context": {
893
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
894
  "seller_transactions_1h": 14,
895
  "linked_cards_7d": 3,
896
  "recent_refunds_7d": 1,
@@ -930,6 +991,8 @@
930
  },
931
  "historical_context": {
932
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
 
 
933
  "seller_transactions_1h": 10,
934
  "linked_cards_7d": 2,
935
  "recent_refunds_7d": 1,
@@ -971,6 +1034,8 @@
971
  },
972
  "historical_context": {
973
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
974
  "seller_transactions_1h": 11,
975
  "linked_cards_7d": 4,
976
  "recent_refunds_7d": 5,
@@ -1010,6 +1075,8 @@
1010
  },
1011
  "historical_context": {
1012
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1013
  "seller_transactions_1h": 15,
1014
  "linked_cards_7d": 3,
1015
  "recent_refunds_7d": 2,
@@ -1049,6 +1116,8 @@
1049
  },
1050
  "historical_context": {
1051
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1052
  "seller_transactions_1h": 6,
1053
  "linked_cards_7d": 5,
1054
  "recent_refunds_7d": 5,
@@ -1088,6 +1157,8 @@
1088
  },
1089
  "historical_context": {
1090
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1091
  "seller_transactions_1h": 3,
1092
  "linked_cards_7d": 7,
1093
  "recent_refunds_7d": 1,
@@ -1127,6 +1198,8 @@
1127
  },
1128
  "historical_context": {
1129
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1130
  "seller_transactions_1h": 17,
1131
  "linked_cards_7d": 6,
1132
  "recent_refunds_7d": 0,
@@ -1166,6 +1239,8 @@
1166
  },
1167
  "historical_context": {
1168
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1169
  "seller_transactions_1h": 17,
1170
  "linked_cards_7d": 6,
1171
  "recent_refunds_7d": 6,
@@ -1205,6 +1280,8 @@
1205
  },
1206
  "historical_context": {
1207
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1208
  "seller_transactions_1h": 4,
1209
  "linked_cards_7d": 7,
1210
  "recent_refunds_7d": 1,
@@ -1244,6 +1321,8 @@
1244
  },
1245
  "historical_context": {
1246
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1247
  "seller_transactions_1h": 4,
1248
  "linked_cards_7d": 8,
1249
  "recent_refunds_7d": 2,
@@ -1283,6 +1362,8 @@
1283
  },
1284
  "historical_context": {
1285
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1286
  "seller_transactions_1h": 10,
1287
  "linked_cards_7d": 6,
1288
  "recent_refunds_7d": 1,
@@ -1322,6 +1403,8 @@
1322
  },
1323
  "historical_context": {
1324
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1325
  "seller_transactions_1h": 12,
1326
  "linked_cards_7d": 5,
1327
  "recent_refunds_7d": 6,
@@ -1361,6 +1444,8 @@
1361
  },
1362
  "historical_context": {
1363
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1364
  "seller_transactions_1h": 18,
1365
  "linked_cards_7d": 6,
1366
  "recent_refunds_7d": 3,
@@ -1400,6 +1485,8 @@
1400
  },
1401
  "historical_context": {
1402
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1403
  "seller_transactions_1h": 6,
1404
  "linked_cards_7d": 2,
1405
  "recent_refunds_7d": 1,
@@ -1439,6 +1526,8 @@
1439
  },
1440
  "historical_context": {
1441
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1442
  "seller_transactions_1h": 8,
1443
  "linked_cards_7d": 5,
1444
  "recent_refunds_7d": 2,
@@ -1478,6 +1567,8 @@
1478
  },
1479
  "historical_context": {
1480
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1481
  "seller_transactions_1h": 10,
1482
  "linked_cards_7d": 3,
1483
  "recent_refunds_7d": 6,
@@ -1517,6 +1608,8 @@
1517
  },
1518
  "historical_context": {
1519
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1520
  "seller_transactions_1h": 3,
1521
  "linked_cards_7d": 5,
1522
  "recent_refunds_7d": 4,
@@ -1556,6 +1649,8 @@
1556
  },
1557
  "historical_context": {
1558
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1559
  "seller_transactions_1h": 15,
1560
  "linked_cards_7d": 1,
1561
  "recent_refunds_7d": 5,
@@ -1595,6 +1690,8 @@
1595
  },
1596
  "historical_context": {
1597
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1598
  "seller_transactions_1h": 17,
1599
  "linked_cards_7d": 5,
1600
  "recent_refunds_7d": 5,
@@ -1634,6 +1731,8 @@
1634
  },
1635
  "historical_context": {
1636
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1637
  "seller_transactions_1h": 5,
1638
  "linked_cards_7d": 5,
1639
  "recent_refunds_7d": 1,
@@ -1673,6 +1772,8 @@
1673
  },
1674
  "historical_context": {
1675
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1676
  "seller_transactions_1h": 17,
1677
  "linked_cards_7d": 2,
1678
  "recent_refunds_7d": 5,
@@ -1712,6 +1813,8 @@
1712
  },
1713
  "historical_context": {
1714
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1715
  "seller_transactions_1h": 1,
1716
  "linked_cards_7d": 4,
1717
  "recent_refunds_7d": 6,
@@ -1751,6 +1854,8 @@
1751
  },
1752
  "historical_context": {
1753
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1754
  "seller_transactions_1h": 14,
1755
  "linked_cards_7d": 6,
1756
  "recent_refunds_7d": 3,
@@ -1790,6 +1895,8 @@
1790
  },
1791
  "historical_context": {
1792
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1793
  "seller_transactions_1h": 18,
1794
  "linked_cards_7d": 2,
1795
  "recent_refunds_7d": 4,
@@ -1829,6 +1936,8 @@
1829
  },
1830
  "historical_context": {
1831
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1832
  "seller_transactions_1h": 10,
1833
  "linked_cards_7d": 1,
1834
  "recent_refunds_7d": 3,
@@ -1868,6 +1977,8 @@
1868
  },
1869
  "historical_context": {
1870
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1871
  "seller_transactions_1h": 13,
1872
  "linked_cards_7d": 2,
1873
  "recent_refunds_7d": 6,
@@ -1907,6 +2018,8 @@
1907
  },
1908
  "historical_context": {
1909
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1910
  "seller_transactions_1h": 15,
1911
  "linked_cards_7d": 3,
1912
  "recent_refunds_7d": 6,
@@ -1946,6 +2059,8 @@
1946
  },
1947
  "historical_context": {
1948
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1949
  "seller_transactions_1h": 7,
1950
  "linked_cards_7d": 6,
1951
  "recent_refunds_7d": 3,
@@ -1985,6 +2100,8 @@
1985
  },
1986
  "historical_context": {
1987
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
1988
  "seller_transactions_1h": 14,
1989
  "linked_cards_7d": 1,
1990
  "recent_refunds_7d": 2,
@@ -2024,6 +2141,8 @@
2024
  },
2025
  "historical_context": {
2026
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2027
  "seller_transactions_1h": 8,
2028
  "linked_cards_7d": 4,
2029
  "recent_refunds_7d": 5,
@@ -2063,6 +2182,8 @@
2063
  },
2064
  "historical_context": {
2065
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2066
  "seller_transactions_1h": 5,
2067
  "linked_cards_7d": 4,
2068
  "recent_refunds_7d": 1,
@@ -2102,6 +2223,8 @@
2102
  },
2103
  "historical_context": {
2104
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2105
  "seller_transactions_1h": 15,
2106
  "linked_cards_7d": 2,
2107
  "recent_refunds_7d": 5,
@@ -2141,6 +2264,8 @@
2141
  },
2142
  "historical_context": {
2143
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2144
  "seller_transactions_1h": 2,
2145
  "linked_cards_7d": 5,
2146
  "recent_refunds_7d": 4,
@@ -2180,6 +2305,8 @@
2180
  },
2181
  "historical_context": {
2182
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2183
  "seller_transactions_1h": 13,
2184
  "linked_cards_7d": 3,
2185
  "recent_refunds_7d": 3,
@@ -2219,6 +2346,8 @@
2219
  },
2220
  "historical_context": {
2221
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2222
  "seller_transactions_1h": 8,
2223
  "linked_cards_7d": 6,
2224
  "recent_refunds_7d": 6,
@@ -2258,6 +2387,8 @@
2258
  },
2259
  "historical_context": {
2260
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2261
  "seller_transactions_1h": 15,
2262
  "linked_cards_7d": 7,
2263
  "recent_refunds_7d": 1,
@@ -2297,6 +2428,8 @@
2297
  },
2298
  "historical_context": {
2299
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2300
  "seller_transactions_1h": 4,
2301
  "linked_cards_7d": 1,
2302
  "recent_refunds_7d": 4,
@@ -2336,6 +2469,8 @@
2336
  },
2337
  "historical_context": {
2338
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
 
 
2339
  "seller_transactions_1h": 12,
2340
  "linked_cards_7d": 1,
2341
  "recent_refunds_7d": 2,
@@ -2377,6 +2512,8 @@
2377
  },
2378
  "historical_context": {
2379
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2380
  "seller_transactions_1h": 4,
2381
  "linked_cards_7d": 6,
2382
  "recent_refunds_7d": 4,
@@ -2416,6 +2553,8 @@
2416
  },
2417
  "historical_context": {
2418
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2419
  "seller_transactions_1h": 27,
2420
  "linked_cards_7d": 2,
2421
  "recent_refunds_7d": 6,
@@ -2455,6 +2594,8 @@
2455
  },
2456
  "historical_context": {
2457
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2458
  "seller_transactions_1h": 18,
2459
  "linked_cards_7d": 7,
2460
  "recent_refunds_7d": 0,
@@ -2494,6 +2635,8 @@
2494
  },
2495
  "historical_context": {
2496
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2497
  "seller_transactions_1h": 26,
2498
  "linked_cards_7d": 7,
2499
  "recent_refunds_7d": 4,
@@ -2533,6 +2676,8 @@
2533
  },
2534
  "historical_context": {
2535
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2536
  "seller_transactions_1h": 17,
2537
  "linked_cards_7d": 7,
2538
  "recent_refunds_7d": 4,
@@ -2572,6 +2717,8 @@
2572
  },
2573
  "historical_context": {
2574
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2575
  "seller_transactions_1h": 17,
2576
  "linked_cards_7d": 2,
2577
  "recent_refunds_7d": 3,
@@ -2611,6 +2758,8 @@
2611
  },
2612
  "historical_context": {
2613
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2614
  "seller_transactions_1h": 9,
2615
  "linked_cards_7d": 5,
2616
  "recent_refunds_7d": 2,
@@ -2650,6 +2799,8 @@
2650
  },
2651
  "historical_context": {
2652
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2653
  "seller_transactions_1h": 13,
2654
  "linked_cards_7d": 7,
2655
  "recent_refunds_7d": 6,
@@ -2689,6 +2840,8 @@
2689
  },
2690
  "historical_context": {
2691
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2692
  "seller_transactions_1h": 2,
2693
  "linked_cards_7d": 2,
2694
  "recent_refunds_7d": 6,
@@ -2728,6 +2881,8 @@
2728
  },
2729
  "historical_context": {
2730
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2731
  "seller_transactions_1h": 14,
2732
  "linked_cards_7d": 4,
2733
  "recent_refunds_7d": 4,
@@ -2767,6 +2922,8 @@
2767
  },
2768
  "historical_context": {
2769
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2770
  "seller_transactions_1h": 21,
2771
  "linked_cards_7d": 3,
2772
  "recent_refunds_7d": 2,
@@ -2806,6 +2963,8 @@
2806
  },
2807
  "historical_context": {
2808
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2809
  "seller_transactions_1h": 24,
2810
  "linked_cards_7d": 3,
2811
  "recent_refunds_7d": 4,
@@ -2845,6 +3004,8 @@
2845
  },
2846
  "historical_context": {
2847
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2848
  "seller_transactions_1h": 21,
2849
  "linked_cards_7d": 4,
2850
  "recent_refunds_7d": 1,
@@ -2884,6 +3045,8 @@
2884
  },
2885
  "historical_context": {
2886
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2887
  "seller_transactions_1h": 3,
2888
  "linked_cards_7d": 3,
2889
  "recent_refunds_7d": 5,
@@ -2923,6 +3086,8 @@
2923
  },
2924
  "historical_context": {
2925
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2926
  "seller_transactions_1h": 29,
2927
  "linked_cards_7d": 1,
2928
  "recent_refunds_7d": 5,
@@ -2962,6 +3127,8 @@
2962
  },
2963
  "historical_context": {
2964
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
2965
  "seller_transactions_1h": 17,
2966
  "linked_cards_7d": 3,
2967
  "recent_refunds_7d": 4,
@@ -3001,6 +3168,8 @@
3001
  },
3002
  "historical_context": {
3003
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3004
  "seller_transactions_1h": 31,
3005
  "linked_cards_7d": 4,
3006
  "recent_refunds_7d": 2,
@@ -3040,6 +3209,8 @@
3040
  },
3041
  "historical_context": {
3042
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3043
  "seller_transactions_1h": 12,
3044
  "linked_cards_7d": 8,
3045
  "recent_refunds_7d": 4,
@@ -3079,6 +3250,8 @@
3079
  },
3080
  "historical_context": {
3081
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3082
  "seller_transactions_1h": 13,
3083
  "linked_cards_7d": 5,
3084
  "recent_refunds_7d": 0,
@@ -3118,6 +3291,8 @@
3118
  },
3119
  "historical_context": {
3120
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3121
  "seller_transactions_1h": 21,
3122
  "linked_cards_7d": 2,
3123
  "recent_refunds_7d": 5,
@@ -3157,6 +3332,8 @@
3157
  },
3158
  "historical_context": {
3159
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3160
  "seller_transactions_1h": 7,
3161
  "linked_cards_7d": 6,
3162
  "recent_refunds_7d": 1,
@@ -3196,6 +3373,8 @@
3196
  },
3197
  "historical_context": {
3198
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3199
  "seller_transactions_1h": 4,
3200
  "linked_cards_7d": 2,
3201
  "recent_refunds_7d": 2,
@@ -3235,6 +3414,8 @@
3235
  },
3236
  "historical_context": {
3237
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3238
  "seller_transactions_1h": 1,
3239
  "linked_cards_7d": 1,
3240
  "recent_refunds_7d": 4,
@@ -3274,6 +3455,8 @@
3274
  },
3275
  "historical_context": {
3276
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3277
  "seller_transactions_1h": 31,
3278
  "linked_cards_7d": 2,
3279
  "recent_refunds_7d": 2,
@@ -3313,6 +3496,8 @@
3313
  },
3314
  "historical_context": {
3315
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3316
  "seller_transactions_1h": 31,
3317
  "linked_cards_7d": 5,
3318
  "recent_refunds_7d": 5,
@@ -3352,6 +3537,8 @@
3352
  },
3353
  "historical_context": {
3354
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3355
  "seller_transactions_1h": 15,
3356
  "linked_cards_7d": 1,
3357
  "recent_refunds_7d": 4,
@@ -3391,6 +3578,8 @@
3391
  },
3392
  "historical_context": {
3393
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3394
  "seller_transactions_1h": 17,
3395
  "linked_cards_7d": 2,
3396
  "recent_refunds_7d": 2,
@@ -3430,6 +3619,8 @@
3430
  },
3431
  "historical_context": {
3432
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3433
  "seller_transactions_1h": 25,
3434
  "linked_cards_7d": 7,
3435
  "recent_refunds_7d": 1,
@@ -3469,6 +3660,8 @@
3469
  },
3470
  "historical_context": {
3471
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3472
  "seller_transactions_1h": 18,
3473
  "linked_cards_7d": 7,
3474
  "recent_refunds_7d": 2,
@@ -3508,6 +3701,8 @@
3508
  },
3509
  "historical_context": {
3510
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3511
  "seller_transactions_1h": 17,
3512
  "linked_cards_7d": 5,
3513
  "recent_refunds_7d": 3,
@@ -3547,6 +3742,8 @@
3547
  },
3548
  "historical_context": {
3549
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3550
  "seller_transactions_1h": 24,
3551
  "linked_cards_7d": 2,
3552
  "recent_refunds_7d": 4,
@@ -3586,6 +3783,8 @@
3586
  },
3587
  "historical_context": {
3588
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3589
  "seller_transactions_1h": 7,
3590
  "linked_cards_7d": 3,
3591
  "recent_refunds_7d": 3,
@@ -3625,6 +3824,8 @@
3625
  },
3626
  "historical_context": {
3627
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3628
  "seller_transactions_1h": 23,
3629
  "linked_cards_7d": 2,
3630
  "recent_refunds_7d": 3,
@@ -3664,6 +3865,8 @@
3664
  },
3665
  "historical_context": {
3666
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3667
  "seller_transactions_1h": 14,
3668
  "linked_cards_7d": 2,
3669
  "recent_refunds_7d": 5,
@@ -3703,6 +3906,8 @@
3703
  },
3704
  "historical_context": {
3705
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3706
  "seller_transactions_1h": 4,
3707
  "linked_cards_7d": 5,
3708
  "recent_refunds_7d": 5,
@@ -3742,6 +3947,8 @@
3742
  },
3743
  "historical_context": {
3744
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3745
  "seller_transactions_1h": 17,
3746
  "linked_cards_7d": 3,
3747
  "recent_refunds_7d": 6,
@@ -3781,6 +3988,8 @@
3781
  },
3782
  "historical_context": {
3783
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3784
  "seller_transactions_1h": 8,
3785
  "linked_cards_7d": 4,
3786
  "recent_refunds_7d": 5,
@@ -3820,6 +4029,8 @@
3820
  },
3821
  "historical_context": {
3822
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3823
  "seller_transactions_1h": 7,
3824
  "linked_cards_7d": 1,
3825
  "recent_refunds_7d": 5,
@@ -3859,6 +4070,8 @@
3859
  },
3860
  "historical_context": {
3861
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3862
  "seller_transactions_1h": 30,
3863
  "linked_cards_7d": 2,
3864
  "recent_refunds_7d": 0,
@@ -3898,6 +4111,8 @@
3898
  },
3899
  "historical_context": {
3900
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3901
  "seller_transactions_1h": 2,
3902
  "linked_cards_7d": 7,
3903
  "recent_refunds_7d": 4,
@@ -3937,6 +4152,8 @@
3937
  },
3938
  "historical_context": {
3939
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3940
  "seller_transactions_1h": 9,
3941
  "linked_cards_7d": 4,
3942
  "recent_refunds_7d": 4,
@@ -3976,6 +4193,8 @@
3976
  },
3977
  "historical_context": {
3978
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
3979
  "seller_transactions_1h": 27,
3980
  "linked_cards_7d": 4,
3981
  "recent_refunds_7d": 6,
@@ -4015,6 +4234,8 @@
4015
  },
4016
  "historical_context": {
4017
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
4018
  "seller_transactions_1h": 21,
4019
  "linked_cards_7d": 6,
4020
  "recent_refunds_7d": 2,
@@ -4054,6 +4275,8 @@
4054
  },
4055
  "historical_context": {
4056
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
4057
  "seller_transactions_1h": 18,
4058
  "linked_cards_7d": 2,
4059
  "recent_refunds_7d": 4,
@@ -4093,6 +4316,8 @@
4093
  },
4094
  "historical_context": {
4095
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
4096
  "seller_transactions_1h": 21,
4097
  "linked_cards_7d": 4,
4098
  "recent_refunds_7d": 6,
@@ -4132,6 +4357,8 @@
4132
  },
4133
  "historical_context": {
4134
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
4135
  "seller_transactions_1h": 34,
4136
  "linked_cards_7d": 5,
4137
  "recent_refunds_7d": 6,
@@ -4171,6 +4398,8 @@
4171
  },
4172
  "historical_context": {
4173
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
4174
  "seller_transactions_1h": 25,
4175
  "linked_cards_7d": 7,
4176
  "recent_refunds_7d": 2,
@@ -4210,6 +4439,8 @@
4210
  },
4211
  "historical_context": {
4212
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
 
 
4213
  "seller_transactions_1h": 9,
4214
  "linked_cards_7d": 2,
4215
  "recent_refunds_7d": 2,
 
1
  {
2
+ "metadata": {
3
+ "snapshot_id": "fraudshield-realworld-v2",
4
+ "schema_version": "2.0",
5
+ "generated_at": "2026-03-30T18:13:06.935921+00:00",
6
+ "seed": 42,
7
+ "build_notes": "Runtime uses this frozen snapshot only. Public source downloads are optional and intended for rebuilding the snapshot offline.",
8
+ "sources": [
9
+ {
10
+ "source_id": "kaggle_creditcardfraud",
11
+ "provider": "Kaggle / ULB",
12
+ "dataset_id": "mlg-ulb/creditcardfraud",
13
+ "title": "Credit Card Fraud Detection",
14
+ "source_url": "https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud",
15
+ "license_note": "Refer to the dataset page for license and usage terms."
16
+ }
17
+ ]
18
+ },
19
  "tasks": {
20
  "easy": [
21
  {
 
48
  },
49
  "historical_context": {
50
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
51
+ "snapshot_id": "fraudshield-realworld-v1",
52
+ "source_id": "kaggle_creditcardfraud",
53
  "seller_transactions_1h": 4,
54
  "linked_cards_7d": 3,
55
  "recent_refunds_7d": 2,
 
89
  },
90
  "historical_context": {
91
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
92
+ "snapshot_id": "fraudshield-realworld-v1",
93
+ "source_id": "kaggle_creditcardfraud",
94
  "seller_transactions_1h": 13,
95
  "linked_cards_7d": 1,
96
  "recent_refunds_7d": 1,
 
130
  },
131
  "historical_context": {
132
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
133
+ "snapshot_id": "fraudshield-realworld-v1",
134
+ "source_id": "kaggle_creditcardfraud",
135
  "seller_transactions_1h": 15,
136
  "linked_cards_7d": 3,
137
  "recent_refunds_7d": 1,
 
171
  },
172
  "historical_context": {
173
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
174
+ "snapshot_id": "fraudshield-realworld-v1",
175
+ "source_id": "kaggle_creditcardfraud",
176
  "seller_transactions_1h": 4,
177
  "linked_cards_7d": 2,
178
  "recent_refunds_7d": 0,
 
212
  },
213
  "historical_context": {
214
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
215
+ "snapshot_id": "fraudshield-realworld-v1",
216
+ "source_id": "kaggle_creditcardfraud",
217
  "seller_transactions_1h": 2,
218
  "linked_cards_7d": 2,
219
  "recent_refunds_7d": 0,
 
253
  },
254
  "historical_context": {
255
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
256
+ "snapshot_id": "fraudshield-realworld-v1",
257
+ "source_id": "kaggle_creditcardfraud",
258
  "seller_transactions_1h": 16,
259
  "linked_cards_7d": 3,
260
  "recent_refunds_7d": 1,
 
294
  },
295
  "historical_context": {
296
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
297
+ "snapshot_id": "fraudshield-realworld-v1",
298
+ "source_id": "kaggle_creditcardfraud",
299
  "seller_transactions_1h": 15,
300
  "linked_cards_7d": 2,
301
  "recent_refunds_7d": 2,
 
335
  },
336
  "historical_context": {
337
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
338
+ "snapshot_id": "fraudshield-realworld-v1",
339
+ "source_id": "kaggle_creditcardfraud",
340
  "seller_transactions_1h": 16,
341
  "linked_cards_7d": 1,
342
  "recent_refunds_7d": 1,
 
376
  },
377
  "historical_context": {
378
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
379
+ "snapshot_id": "fraudshield-realworld-v1",
380
+ "source_id": "kaggle_creditcardfraud",
381
  "seller_transactions_1h": 17,
382
  "linked_cards_7d": 4,
383
  "recent_refunds_7d": 1,
 
417
  },
418
  "historical_context": {
419
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
420
+ "snapshot_id": "fraudshield-realworld-v1",
421
+ "source_id": "kaggle_creditcardfraud",
422
  "seller_transactions_1h": 5,
423
  "linked_cards_7d": 3,
424
  "recent_refunds_7d": 2,
 
458
  },
459
  "historical_context": {
460
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
461
+ "snapshot_id": "fraudshield-realworld-v1",
462
+ "source_id": "kaggle_creditcardfraud",
463
  "seller_transactions_1h": 18,
464
  "linked_cards_7d": 3,
465
  "recent_refunds_7d": 1,
 
499
  },
500
  "historical_context": {
501
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
502
+ "snapshot_id": "fraudshield-realworld-v1",
503
+ "source_id": "kaggle_creditcardfraud",
504
  "seller_transactions_1h": 15,
505
  "linked_cards_7d": 3,
506
  "recent_refunds_7d": 1,
 
540
  },
541
  "historical_context": {
542
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
543
+ "snapshot_id": "fraudshield-realworld-v1",
544
+ "source_id": "kaggle_creditcardfraud",
545
  "seller_transactions_1h": 8,
546
  "linked_cards_7d": 1,
547
  "recent_refunds_7d": 1,
 
581
  },
582
  "historical_context": {
583
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
584
+ "snapshot_id": "fraudshield-realworld-v1",
585
+ "source_id": "kaggle_creditcardfraud",
586
  "seller_transactions_1h": 8,
587
  "linked_cards_7d": 2,
588
  "recent_refunds_7d": 1,
 
622
  },
623
  "historical_context": {
624
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
625
+ "snapshot_id": "fraudshield-realworld-v1",
626
+ "source_id": "kaggle_creditcardfraud",
627
  "seller_transactions_1h": 6,
628
  "linked_cards_7d": 1,
629
  "recent_refunds_7d": 2,
 
663
  },
664
  "historical_context": {
665
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
666
+ "snapshot_id": "fraudshield-realworld-v1",
667
+ "source_id": "kaggle_creditcardfraud",
668
  "seller_transactions_1h": 9,
669
  "linked_cards_7d": 2,
670
  "recent_refunds_7d": 1,
 
704
  },
705
  "historical_context": {
706
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
707
+ "snapshot_id": "fraudshield-realworld-v1",
708
+ "source_id": "kaggle_creditcardfraud",
709
  "seller_transactions_1h": 11,
710
  "linked_cards_7d": 1,
711
  "recent_refunds_7d": 1,
 
745
  },
746
  "historical_context": {
747
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
748
+ "snapshot_id": "fraudshield-realworld-v1",
749
+ "source_id": "kaggle_creditcardfraud",
750
  "seller_transactions_1h": 2,
751
  "linked_cards_7d": 1,
752
  "recent_refunds_7d": 1,
 
786
  },
787
  "historical_context": {
788
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
789
+ "snapshot_id": "fraudshield-realworld-v1",
790
+ "source_id": "kaggle_creditcardfraud",
791
  "seller_transactions_1h": 10,
792
  "linked_cards_7d": 4,
793
  "recent_refunds_7d": 0,
 
827
  },
828
  "historical_context": {
829
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
830
+ "snapshot_id": "fraudshield-realworld-v1",
831
+ "source_id": "kaggle_creditcardfraud",
832
  "seller_transactions_1h": 1,
833
  "linked_cards_7d": 3,
834
  "recent_refunds_7d": 1,
 
868
  },
869
  "historical_context": {
870
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
871
+ "snapshot_id": "fraudshield-realworld-v1",
872
+ "source_id": "kaggle_creditcardfraud",
873
  "seller_transactions_1h": 5,
874
  "linked_cards_7d": 3,
875
  "recent_refunds_7d": 2,
 
909
  },
910
  "historical_context": {
911
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
912
+ "snapshot_id": "fraudshield-realworld-v1",
913
+ "source_id": "kaggle_creditcardfraud",
914
  "seller_transactions_1h": 17,
915
  "linked_cards_7d": 2,
916
  "recent_refunds_7d": 2,
 
950
  },
951
  "historical_context": {
952
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
953
+ "snapshot_id": "fraudshield-realworld-v1",
954
+ "source_id": "kaggle_creditcardfraud",
955
  "seller_transactions_1h": 14,
956
  "linked_cards_7d": 3,
957
  "recent_refunds_7d": 1,
 
991
  },
992
  "historical_context": {
993
  "task_focus": "Obvious fraud markers such as new sellers, price gaps, and geo mismatch.",
994
+ "snapshot_id": "fraudshield-realworld-v1",
995
+ "source_id": "kaggle_creditcardfraud",
996
  "seller_transactions_1h": 10,
997
  "linked_cards_7d": 2,
998
  "recent_refunds_7d": 1,
 
1034
  },
1035
  "historical_context": {
1036
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1037
+ "snapshot_id": "fraudshield-realworld-v1",
1038
+ "source_id": "kaggle_creditcardfraud",
1039
  "seller_transactions_1h": 11,
1040
  "linked_cards_7d": 4,
1041
  "recent_refunds_7d": 5,
 
1075
  },
1076
  "historical_context": {
1077
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1078
+ "snapshot_id": "fraudshield-realworld-v1",
1079
+ "source_id": "kaggle_creditcardfraud",
1080
  "seller_transactions_1h": 15,
1081
  "linked_cards_7d": 3,
1082
  "recent_refunds_7d": 2,
 
1116
  },
1117
  "historical_context": {
1118
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1119
+ "snapshot_id": "fraudshield-realworld-v1",
1120
+ "source_id": "kaggle_creditcardfraud",
1121
  "seller_transactions_1h": 6,
1122
  "linked_cards_7d": 5,
1123
  "recent_refunds_7d": 5,
 
1157
  },
1158
  "historical_context": {
1159
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1160
+ "snapshot_id": "fraudshield-realworld-v1",
1161
+ "source_id": "kaggle_creditcardfraud",
1162
  "seller_transactions_1h": 3,
1163
  "linked_cards_7d": 7,
1164
  "recent_refunds_7d": 1,
 
1198
  },
1199
  "historical_context": {
1200
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1201
+ "snapshot_id": "fraudshield-realworld-v1",
1202
+ "source_id": "kaggle_creditcardfraud",
1203
  "seller_transactions_1h": 17,
1204
  "linked_cards_7d": 6,
1205
  "recent_refunds_7d": 0,
 
1239
  },
1240
  "historical_context": {
1241
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1242
+ "snapshot_id": "fraudshield-realworld-v1",
1243
+ "source_id": "kaggle_creditcardfraud",
1244
  "seller_transactions_1h": 17,
1245
  "linked_cards_7d": 6,
1246
  "recent_refunds_7d": 6,
 
1280
  },
1281
  "historical_context": {
1282
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1283
+ "snapshot_id": "fraudshield-realworld-v1",
1284
+ "source_id": "kaggle_creditcardfraud",
1285
  "seller_transactions_1h": 4,
1286
  "linked_cards_7d": 7,
1287
  "recent_refunds_7d": 1,
 
1321
  },
1322
  "historical_context": {
1323
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1324
+ "snapshot_id": "fraudshield-realworld-v1",
1325
+ "source_id": "kaggle_creditcardfraud",
1326
  "seller_transactions_1h": 4,
1327
  "linked_cards_7d": 8,
1328
  "recent_refunds_7d": 2,
 
1362
  },
1363
  "historical_context": {
1364
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1365
+ "snapshot_id": "fraudshield-realworld-v1",
1366
+ "source_id": "kaggle_creditcardfraud",
1367
  "seller_transactions_1h": 10,
1368
  "linked_cards_7d": 6,
1369
  "recent_refunds_7d": 1,
 
1403
  },
1404
  "historical_context": {
1405
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1406
+ "snapshot_id": "fraudshield-realworld-v1",
1407
+ "source_id": "kaggle_creditcardfraud",
1408
  "seller_transactions_1h": 12,
1409
  "linked_cards_7d": 5,
1410
  "recent_refunds_7d": 6,
 
1444
  },
1445
  "historical_context": {
1446
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1447
+ "snapshot_id": "fraudshield-realworld-v1",
1448
+ "source_id": "kaggle_creditcardfraud",
1449
  "seller_transactions_1h": 18,
1450
  "linked_cards_7d": 6,
1451
  "recent_refunds_7d": 3,
 
1485
  },
1486
  "historical_context": {
1487
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1488
+ "snapshot_id": "fraudshield-realworld-v1",
1489
+ "source_id": "kaggle_creditcardfraud",
1490
  "seller_transactions_1h": 6,
1491
  "linked_cards_7d": 2,
1492
  "recent_refunds_7d": 1,
 
1526
  },
1527
  "historical_context": {
1528
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1529
+ "snapshot_id": "fraudshield-realworld-v1",
1530
+ "source_id": "kaggle_creditcardfraud",
1531
  "seller_transactions_1h": 8,
1532
  "linked_cards_7d": 5,
1533
  "recent_refunds_7d": 2,
 
1567
  },
1568
  "historical_context": {
1569
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1570
+ "snapshot_id": "fraudshield-realworld-v1",
1571
+ "source_id": "kaggle_creditcardfraud",
1572
  "seller_transactions_1h": 10,
1573
  "linked_cards_7d": 3,
1574
  "recent_refunds_7d": 6,
 
1608
  },
1609
  "historical_context": {
1610
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1611
+ "snapshot_id": "fraudshield-realworld-v1",
1612
+ "source_id": "kaggle_creditcardfraud",
1613
  "seller_transactions_1h": 3,
1614
  "linked_cards_7d": 5,
1615
  "recent_refunds_7d": 4,
 
1649
  },
1650
  "historical_context": {
1651
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1652
+ "snapshot_id": "fraudshield-realworld-v1",
1653
+ "source_id": "kaggle_creditcardfraud",
1654
  "seller_transactions_1h": 15,
1655
  "linked_cards_7d": 1,
1656
  "recent_refunds_7d": 5,
 
1690
  },
1691
  "historical_context": {
1692
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1693
+ "snapshot_id": "fraudshield-realworld-v1",
1694
+ "source_id": "kaggle_creditcardfraud",
1695
  "seller_transactions_1h": 17,
1696
  "linked_cards_7d": 5,
1697
  "recent_refunds_7d": 5,
 
1731
  },
1732
  "historical_context": {
1733
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1734
+ "snapshot_id": "fraudshield-realworld-v1",
1735
+ "source_id": "kaggle_creditcardfraud",
1736
  "seller_transactions_1h": 5,
1737
  "linked_cards_7d": 5,
1738
  "recent_refunds_7d": 1,
 
1772
  },
1773
  "historical_context": {
1774
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1775
+ "snapshot_id": "fraudshield-realworld-v1",
1776
+ "source_id": "kaggle_creditcardfraud",
1777
  "seller_transactions_1h": 17,
1778
  "linked_cards_7d": 2,
1779
  "recent_refunds_7d": 5,
 
1813
  },
1814
  "historical_context": {
1815
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1816
+ "snapshot_id": "fraudshield-realworld-v1",
1817
+ "source_id": "kaggle_creditcardfraud",
1818
  "seller_transactions_1h": 1,
1819
  "linked_cards_7d": 4,
1820
  "recent_refunds_7d": 6,
 
1854
  },
1855
  "historical_context": {
1856
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1857
+ "snapshot_id": "fraudshield-realworld-v1",
1858
+ "source_id": "kaggle_creditcardfraud",
1859
  "seller_transactions_1h": 14,
1860
  "linked_cards_7d": 6,
1861
  "recent_refunds_7d": 3,
 
1895
  },
1896
  "historical_context": {
1897
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1898
+ "snapshot_id": "fraudshield-realworld-v1",
1899
+ "source_id": "kaggle_creditcardfraud",
1900
  "seller_transactions_1h": 18,
1901
  "linked_cards_7d": 2,
1902
  "recent_refunds_7d": 4,
 
1936
  },
1937
  "historical_context": {
1938
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1939
+ "snapshot_id": "fraudshield-realworld-v1",
1940
+ "source_id": "kaggle_creditcardfraud",
1941
  "seller_transactions_1h": 10,
1942
  "linked_cards_7d": 1,
1943
  "recent_refunds_7d": 3,
 
1977
  },
1978
  "historical_context": {
1979
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
1980
+ "snapshot_id": "fraudshield-realworld-v1",
1981
+ "source_id": "kaggle_creditcardfraud",
1982
  "seller_transactions_1h": 13,
1983
  "linked_cards_7d": 2,
1984
  "recent_refunds_7d": 6,
 
2018
  },
2019
  "historical_context": {
2020
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2021
+ "snapshot_id": "fraudshield-realworld-v1",
2022
+ "source_id": "kaggle_creditcardfraud",
2023
  "seller_transactions_1h": 15,
2024
  "linked_cards_7d": 3,
2025
  "recent_refunds_7d": 6,
 
2059
  },
2060
  "historical_context": {
2061
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2062
+ "snapshot_id": "fraudshield-realworld-v1",
2063
+ "source_id": "kaggle_creditcardfraud",
2064
  "seller_transactions_1h": 7,
2065
  "linked_cards_7d": 6,
2066
  "recent_refunds_7d": 3,
 
2100
  },
2101
  "historical_context": {
2102
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2103
+ "snapshot_id": "fraudshield-realworld-v1",
2104
+ "source_id": "kaggle_creditcardfraud",
2105
  "seller_transactions_1h": 14,
2106
  "linked_cards_7d": 1,
2107
  "recent_refunds_7d": 2,
 
2141
  },
2142
  "historical_context": {
2143
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2144
+ "snapshot_id": "fraudshield-realworld-v1",
2145
+ "source_id": "kaggle_creditcardfraud",
2146
  "seller_transactions_1h": 8,
2147
  "linked_cards_7d": 4,
2148
  "recent_refunds_7d": 5,
 
2182
  },
2183
  "historical_context": {
2184
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2185
+ "snapshot_id": "fraudshield-realworld-v1",
2186
+ "source_id": "kaggle_creditcardfraud",
2187
  "seller_transactions_1h": 5,
2188
  "linked_cards_7d": 4,
2189
  "recent_refunds_7d": 1,
 
2223
  },
2224
  "historical_context": {
2225
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2226
+ "snapshot_id": "fraudshield-realworld-v1",
2227
+ "source_id": "kaggle_creditcardfraud",
2228
  "seller_transactions_1h": 15,
2229
  "linked_cards_7d": 2,
2230
  "recent_refunds_7d": 5,
 
2264
  },
2265
  "historical_context": {
2266
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2267
+ "snapshot_id": "fraudshield-realworld-v1",
2268
+ "source_id": "kaggle_creditcardfraud",
2269
  "seller_transactions_1h": 2,
2270
  "linked_cards_7d": 5,
2271
  "recent_refunds_7d": 4,
 
2305
  },
2306
  "historical_context": {
2307
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2308
+ "snapshot_id": "fraudshield-realworld-v1",
2309
+ "source_id": "kaggle_creditcardfraud",
2310
  "seller_transactions_1h": 13,
2311
  "linked_cards_7d": 3,
2312
  "recent_refunds_7d": 3,
 
2346
  },
2347
  "historical_context": {
2348
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2349
+ "snapshot_id": "fraudshield-realworld-v1",
2350
+ "source_id": "kaggle_creditcardfraud",
2351
  "seller_transactions_1h": 8,
2352
  "linked_cards_7d": 6,
2353
  "recent_refunds_7d": 6,
 
2387
  },
2388
  "historical_context": {
2389
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2390
+ "snapshot_id": "fraudshield-realworld-v1",
2391
+ "source_id": "kaggle_creditcardfraud",
2392
  "seller_transactions_1h": 15,
2393
  "linked_cards_7d": 7,
2394
  "recent_refunds_7d": 1,
 
2428
  },
2429
  "historical_context": {
2430
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2431
+ "snapshot_id": "fraudshield-realworld-v1",
2432
+ "source_id": "kaggle_creditcardfraud",
2433
  "seller_transactions_1h": 4,
2434
  "linked_cards_7d": 1,
2435
  "recent_refunds_7d": 4,
 
2469
  },
2470
  "historical_context": {
2471
  "task_focus": "Mixed-signal reviews where no single indicator is decisive.",
2472
+ "snapshot_id": "fraudshield-realworld-v1",
2473
+ "source_id": "kaggle_creditcardfraud",
2474
  "seller_transactions_1h": 12,
2475
  "linked_cards_7d": 1,
2476
  "recent_refunds_7d": 2,
 
2512
  },
2513
  "historical_context": {
2514
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2515
+ "snapshot_id": "fraudshield-realworld-v1",
2516
+ "source_id": "kaggle_creditcardfraud",
2517
  "seller_transactions_1h": 4,
2518
  "linked_cards_7d": 6,
2519
  "recent_refunds_7d": 4,
 
2553
  },
2554
  "historical_context": {
2555
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2556
+ "snapshot_id": "fraudshield-realworld-v1",
2557
+ "source_id": "kaggle_creditcardfraud",
2558
  "seller_transactions_1h": 27,
2559
  "linked_cards_7d": 2,
2560
  "recent_refunds_7d": 6,
 
2594
  },
2595
  "historical_context": {
2596
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2597
+ "snapshot_id": "fraudshield-realworld-v1",
2598
+ "source_id": "kaggle_creditcardfraud",
2599
  "seller_transactions_1h": 18,
2600
  "linked_cards_7d": 7,
2601
  "recent_refunds_7d": 0,
 
2635
  },
2636
  "historical_context": {
2637
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2638
+ "snapshot_id": "fraudshield-realworld-v1",
2639
+ "source_id": "kaggle_creditcardfraud",
2640
  "seller_transactions_1h": 26,
2641
  "linked_cards_7d": 7,
2642
  "recent_refunds_7d": 4,
 
2676
  },
2677
  "historical_context": {
2678
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2679
+ "snapshot_id": "fraudshield-realworld-v1",
2680
+ "source_id": "kaggle_creditcardfraud",
2681
  "seller_transactions_1h": 17,
2682
  "linked_cards_7d": 7,
2683
  "recent_refunds_7d": 4,
 
2717
  },
2718
  "historical_context": {
2719
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2720
+ "snapshot_id": "fraudshield-realworld-v1",
2721
+ "source_id": "kaggle_creditcardfraud",
2722
  "seller_transactions_1h": 17,
2723
  "linked_cards_7d": 2,
2724
  "recent_refunds_7d": 3,
 
2758
  },
2759
  "historical_context": {
2760
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2761
+ "snapshot_id": "fraudshield-realworld-v1",
2762
+ "source_id": "kaggle_creditcardfraud",
2763
  "seller_transactions_1h": 9,
2764
  "linked_cards_7d": 5,
2765
  "recent_refunds_7d": 2,
 
2799
  },
2800
  "historical_context": {
2801
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2802
+ "snapshot_id": "fraudshield-realworld-v1",
2803
+ "source_id": "kaggle_creditcardfraud",
2804
  "seller_transactions_1h": 13,
2805
  "linked_cards_7d": 7,
2806
  "recent_refunds_7d": 6,
 
2840
  },
2841
  "historical_context": {
2842
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2843
+ "snapshot_id": "fraudshield-realworld-v1",
2844
+ "source_id": "kaggle_creditcardfraud",
2845
  "seller_transactions_1h": 2,
2846
  "linked_cards_7d": 2,
2847
  "recent_refunds_7d": 6,
 
2881
  },
2882
  "historical_context": {
2883
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2884
+ "snapshot_id": "fraudshield-realworld-v1",
2885
+ "source_id": "kaggle_creditcardfraud",
2886
  "seller_transactions_1h": 14,
2887
  "linked_cards_7d": 4,
2888
  "recent_refunds_7d": 4,
 
2922
  },
2923
  "historical_context": {
2924
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2925
+ "snapshot_id": "fraudshield-realworld-v1",
2926
+ "source_id": "kaggle_creditcardfraud",
2927
  "seller_transactions_1h": 21,
2928
  "linked_cards_7d": 3,
2929
  "recent_refunds_7d": 2,
 
2963
  },
2964
  "historical_context": {
2965
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
2966
+ "snapshot_id": "fraudshield-realworld-v1",
2967
+ "source_id": "kaggle_creditcardfraud",
2968
  "seller_transactions_1h": 24,
2969
  "linked_cards_7d": 3,
2970
  "recent_refunds_7d": 4,
 
3004
  },
3005
  "historical_context": {
3006
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3007
+ "snapshot_id": "fraudshield-realworld-v1",
3008
+ "source_id": "kaggle_creditcardfraud",
3009
  "seller_transactions_1h": 21,
3010
  "linked_cards_7d": 4,
3011
  "recent_refunds_7d": 1,
 
3045
  },
3046
  "historical_context": {
3047
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3048
+ "snapshot_id": "fraudshield-realworld-v1",
3049
+ "source_id": "kaggle_creditcardfraud",
3050
  "seller_transactions_1h": 3,
3051
  "linked_cards_7d": 3,
3052
  "recent_refunds_7d": 5,
 
3086
  },
3087
  "historical_context": {
3088
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3089
+ "snapshot_id": "fraudshield-realworld-v1",
3090
+ "source_id": "kaggle_creditcardfraud",
3091
  "seller_transactions_1h": 29,
3092
  "linked_cards_7d": 1,
3093
  "recent_refunds_7d": 5,
 
3127
  },
3128
  "historical_context": {
3129
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3130
+ "snapshot_id": "fraudshield-realworld-v1",
3131
+ "source_id": "kaggle_creditcardfraud",
3132
  "seller_transactions_1h": 17,
3133
  "linked_cards_7d": 3,
3134
  "recent_refunds_7d": 4,
 
3168
  },
3169
  "historical_context": {
3170
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3171
+ "snapshot_id": "fraudshield-realworld-v1",
3172
+ "source_id": "kaggle_creditcardfraud",
3173
  "seller_transactions_1h": 31,
3174
  "linked_cards_7d": 4,
3175
  "recent_refunds_7d": 2,
 
3209
  },
3210
  "historical_context": {
3211
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3212
+ "snapshot_id": "fraudshield-realworld-v1",
3213
+ "source_id": "kaggle_creditcardfraud",
3214
  "seller_transactions_1h": 12,
3215
  "linked_cards_7d": 8,
3216
  "recent_refunds_7d": 4,
 
3250
  },
3251
  "historical_context": {
3252
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3253
+ "snapshot_id": "fraudshield-realworld-v1",
3254
+ "source_id": "kaggle_creditcardfraud",
3255
  "seller_transactions_1h": 13,
3256
  "linked_cards_7d": 5,
3257
  "recent_refunds_7d": 0,
 
3291
  },
3292
  "historical_context": {
3293
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3294
+ "snapshot_id": "fraudshield-realworld-v1",
3295
+ "source_id": "kaggle_creditcardfraud",
3296
  "seller_transactions_1h": 21,
3297
  "linked_cards_7d": 2,
3298
  "recent_refunds_7d": 5,
 
3332
  },
3333
  "historical_context": {
3334
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3335
+ "snapshot_id": "fraudshield-realworld-v1",
3336
+ "source_id": "kaggle_creditcardfraud",
3337
  "seller_transactions_1h": 7,
3338
  "linked_cards_7d": 6,
3339
  "recent_refunds_7d": 1,
 
3373
  },
3374
  "historical_context": {
3375
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3376
+ "snapshot_id": "fraudshield-realworld-v1",
3377
+ "source_id": "kaggle_creditcardfraud",
3378
  "seller_transactions_1h": 4,
3379
  "linked_cards_7d": 2,
3380
  "recent_refunds_7d": 2,
 
3414
  },
3415
  "historical_context": {
3416
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3417
+ "snapshot_id": "fraudshield-realworld-v1",
3418
+ "source_id": "kaggle_creditcardfraud",
3419
  "seller_transactions_1h": 1,
3420
  "linked_cards_7d": 1,
3421
  "recent_refunds_7d": 4,
 
3455
  },
3456
  "historical_context": {
3457
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3458
+ "snapshot_id": "fraudshield-realworld-v1",
3459
+ "source_id": "kaggle_creditcardfraud",
3460
  "seller_transactions_1h": 31,
3461
  "linked_cards_7d": 2,
3462
  "recent_refunds_7d": 2,
 
3496
  },
3497
  "historical_context": {
3498
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3499
+ "snapshot_id": "fraudshield-realworld-v1",
3500
+ "source_id": "kaggle_creditcardfraud",
3501
  "seller_transactions_1h": 31,
3502
  "linked_cards_7d": 5,
3503
  "recent_refunds_7d": 5,
 
3537
  },
3538
  "historical_context": {
3539
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3540
+ "snapshot_id": "fraudshield-realworld-v1",
3541
+ "source_id": "kaggle_creditcardfraud",
3542
  "seller_transactions_1h": 15,
3543
  "linked_cards_7d": 1,
3544
  "recent_refunds_7d": 4,
 
3578
  },
3579
  "historical_context": {
3580
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3581
+ "snapshot_id": "fraudshield-realworld-v1",
3582
+ "source_id": "kaggle_creditcardfraud",
3583
  "seller_transactions_1h": 17,
3584
  "linked_cards_7d": 2,
3585
  "recent_refunds_7d": 2,
 
3619
  },
3620
  "historical_context": {
3621
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3622
+ "snapshot_id": "fraudshield-realworld-v1",
3623
+ "source_id": "kaggle_creditcardfraud",
3624
  "seller_transactions_1h": 25,
3625
  "linked_cards_7d": 7,
3626
  "recent_refunds_7d": 1,
 
3660
  },
3661
  "historical_context": {
3662
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3663
+ "snapshot_id": "fraudshield-realworld-v1",
3664
+ "source_id": "kaggle_creditcardfraud",
3665
  "seller_transactions_1h": 18,
3666
  "linked_cards_7d": 7,
3667
  "recent_refunds_7d": 2,
 
3701
  },
3702
  "historical_context": {
3703
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3704
+ "snapshot_id": "fraudshield-realworld-v1",
3705
+ "source_id": "kaggle_creditcardfraud",
3706
  "seller_transactions_1h": 17,
3707
  "linked_cards_7d": 5,
3708
  "recent_refunds_7d": 3,
 
3742
  },
3743
  "historical_context": {
3744
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3745
+ "snapshot_id": "fraudshield-realworld-v1",
3746
+ "source_id": "kaggle_creditcardfraud",
3747
  "seller_transactions_1h": 24,
3748
  "linked_cards_7d": 2,
3749
  "recent_refunds_7d": 4,
 
3783
  },
3784
  "historical_context": {
3785
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3786
+ "snapshot_id": "fraudshield-realworld-v1",
3787
+ "source_id": "kaggle_creditcardfraud",
3788
  "seller_transactions_1h": 7,
3789
  "linked_cards_7d": 3,
3790
  "recent_refunds_7d": 3,
 
3824
  },
3825
  "historical_context": {
3826
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3827
+ "snapshot_id": "fraudshield-realworld-v1",
3828
+ "source_id": "kaggle_creditcardfraud",
3829
  "seller_transactions_1h": 23,
3830
  "linked_cards_7d": 2,
3831
  "recent_refunds_7d": 3,
 
3865
  },
3866
  "historical_context": {
3867
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3868
+ "snapshot_id": "fraudshield-realworld-v1",
3869
+ "source_id": "kaggle_creditcardfraud",
3870
  "seller_transactions_1h": 14,
3871
  "linked_cards_7d": 2,
3872
  "recent_refunds_7d": 5,
 
3906
  },
3907
  "historical_context": {
3908
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3909
+ "snapshot_id": "fraudshield-realworld-v1",
3910
+ "source_id": "kaggle_creditcardfraud",
3911
  "seller_transactions_1h": 4,
3912
  "linked_cards_7d": 5,
3913
  "recent_refunds_7d": 5,
 
3947
  },
3948
  "historical_context": {
3949
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3950
+ "snapshot_id": "fraudshield-realworld-v1",
3951
+ "source_id": "kaggle_creditcardfraud",
3952
  "seller_transactions_1h": 17,
3953
  "linked_cards_7d": 3,
3954
  "recent_refunds_7d": 6,
 
3988
  },
3989
  "historical_context": {
3990
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
3991
+ "snapshot_id": "fraudshield-realworld-v1",
3992
+ "source_id": "kaggle_creditcardfraud",
3993
  "seller_transactions_1h": 8,
3994
  "linked_cards_7d": 4,
3995
  "recent_refunds_7d": 5,
 
4029
  },
4030
  "historical_context": {
4031
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4032
+ "snapshot_id": "fraudshield-realworld-v1",
4033
+ "source_id": "kaggle_creditcardfraud",
4034
  "seller_transactions_1h": 7,
4035
  "linked_cards_7d": 1,
4036
  "recent_refunds_7d": 5,
 
4070
  },
4071
  "historical_context": {
4072
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4073
+ "snapshot_id": "fraudshield-realworld-v1",
4074
+ "source_id": "kaggle_creditcardfraud",
4075
  "seller_transactions_1h": 30,
4076
  "linked_cards_7d": 2,
4077
  "recent_refunds_7d": 0,
 
4111
  },
4112
  "historical_context": {
4113
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4114
+ "snapshot_id": "fraudshield-realworld-v1",
4115
+ "source_id": "kaggle_creditcardfraud",
4116
  "seller_transactions_1h": 2,
4117
  "linked_cards_7d": 7,
4118
  "recent_refunds_7d": 4,
 
4152
  },
4153
  "historical_context": {
4154
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4155
+ "snapshot_id": "fraudshield-realworld-v1",
4156
+ "source_id": "kaggle_creditcardfraud",
4157
  "seller_transactions_1h": 9,
4158
  "linked_cards_7d": 4,
4159
  "recent_refunds_7d": 4,
 
4193
  },
4194
  "historical_context": {
4195
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4196
+ "snapshot_id": "fraudshield-realworld-v1",
4197
+ "source_id": "kaggle_creditcardfraud",
4198
  "seller_transactions_1h": 27,
4199
  "linked_cards_7d": 4,
4200
  "recent_refunds_7d": 6,
 
4234
  },
4235
  "historical_context": {
4236
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4237
+ "snapshot_id": "fraudshield-realworld-v1",
4238
+ "source_id": "kaggle_creditcardfraud",
4239
  "seller_transactions_1h": 21,
4240
  "linked_cards_7d": 6,
4241
  "recent_refunds_7d": 2,
 
4275
  },
4276
  "historical_context": {
4277
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4278
+ "snapshot_id": "fraudshield-realworld-v1",
4279
+ "source_id": "kaggle_creditcardfraud",
4280
  "seller_transactions_1h": 18,
4281
  "linked_cards_7d": 2,
4282
  "recent_refunds_7d": 4,
 
4316
  },
4317
  "historical_context": {
4318
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4319
+ "snapshot_id": "fraudshield-realworld-v1",
4320
+ "source_id": "kaggle_creditcardfraud",
4321
  "seller_transactions_1h": 21,
4322
  "linked_cards_7d": 4,
4323
  "recent_refunds_7d": 6,
 
4357
  },
4358
  "historical_context": {
4359
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4360
+ "snapshot_id": "fraudshield-realworld-v1",
4361
+ "source_id": "kaggle_creditcardfraud",
4362
  "seller_transactions_1h": 34,
4363
  "linked_cards_7d": 5,
4364
  "recent_refunds_7d": 6,
 
4398
  },
4399
  "historical_context": {
4400
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4401
+ "snapshot_id": "fraudshield-realworld-v1",
4402
+ "source_id": "kaggle_creditcardfraud",
4403
  "seller_transactions_1h": 25,
4404
  "linked_cards_7d": 7,
4405
  "recent_refunds_7d": 2,
 
4439
  },
4440
  "historical_context": {
4441
  "task_focus": "Coordinated abuse and high-volume legitimate edge cases with overlap.",
4442
+ "snapshot_id": "fraudshield-realworld-v1",
4443
+ "source_id": "kaggle_creditcardfraud",
4444
  "seller_transactions_1h": 9,
4445
  "linked_cards_7d": 2,
4446
  "recent_refunds_7d": 2,
data_loader.py CHANGED
@@ -1,11 +1,11 @@
1
- """Deterministic FraudShield task bundle loader."""
2
 
3
  from __future__ import annotations
4
 
5
  import hashlib
6
  import json
7
  import logging
8
- from datetime import datetime, timedelta
9
  from pathlib import Path
10
  from typing import Any, Dict, List, Tuple
11
 
@@ -14,6 +14,19 @@ import pandas as pd
14
 
15
  logger = logging.getLogger(__name__)
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  TASK_SPECS: Dict[str, Dict[str, Any]] = {
18
  "easy": {
19
  "per_class": 12,
@@ -36,8 +49,8 @@ TASK_SPECS: Dict[str, Dict[str, Any]] = {
36
  }
37
 
38
 
39
- class KaggleDataLoader:
40
- """Loads the curated FraudShield task bundle or builds it from Kaggle data."""
41
 
42
  def __init__(self, data_path: str = "data", seed: int = 42):
43
  self.seed = seed
@@ -47,54 +60,98 @@ class KaggleDataLoader:
47
  self.bundle_file = self.data_path / "fraudshield_cases.json"
48
  self.df: pd.DataFrame | None = None
49
  self.task_bundle: Dict[str, List[Dict[str, Any]]] = {}
 
 
50
 
51
- def download_data(self) -> bool:
52
- """Download the source Kaggle dataset for bundle regeneration."""
 
 
 
53
 
54
  try:
55
  import kaggle
56
 
57
- logger.info("Downloading Kaggle Credit Card Fraud dataset...")
58
  kaggle.api.dataset_download_files(
59
- "mlg-ulb/creditcardfraud",
60
  path=str(self.data_path),
61
  unzip=True,
62
  )
63
  logger.info("Downloaded source data to %s", self.data_path)
64
  return True
65
  except Exception as exc: # pragma: no cover - external dependency
66
- logger.error("Failed to download Kaggle data: %s", exc)
67
  return False
68
 
69
- def load_data(self) -> bool:
70
- """Load the compact task bundle, building it from CSV if needed."""
 
 
 
 
 
71
 
72
  if self.bundle_file.exists():
73
- self.task_bundle = json.loads(self.bundle_file.read_text(encoding="utf-8"))["tasks"]
74
- logger.info("Loaded curated task bundle from %s", self.bundle_file)
 
 
 
 
 
 
75
  return True
76
 
77
  if not self.csv_file.exists():
78
  logger.error("Neither %s nor %s is available.", self.bundle_file, self.csv_file)
79
  return False
80
 
81
- logger.info("Building curated task bundle from %s", self.csv_file)
82
  self.df = pd.read_csv(self.csv_file)
83
  self.task_bundle = self._build_task_bundle()
84
- payload = {
85
- "seed": self.seed,
86
- "source": "Kaggle creditcardfraud",
87
- "tasks": self.task_bundle,
88
- }
89
  self.bundle_file.write_text(json.dumps(payload, indent=2), encoding="utf-8")
90
- logger.info("Wrote deterministic task bundle to %s", self.bundle_file)
91
  return True
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  def get_task_cases(self, task: str) -> List[Dict[str, Any]]:
94
  """Return the full case records for a given task."""
95
 
96
  if not self.task_bundle:
97
- raise RuntimeError("Data not loaded. Call load_data() first.")
98
  if task not in self.task_bundle:
99
  raise ValueError(f"Unknown task: {task}")
100
  return list(self.task_bundle[task])
@@ -116,8 +173,59 @@ class KaggleDataLoader:
116
  [case["label"] for case in hard_cases],
117
  )
118
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
  def _build_task_bundle(self) -> Dict[str, List[Dict[str, Any]]]:
120
- """Create deterministic tasks from the Kaggle dataset."""
121
 
122
  if self.df is None:
123
  raise RuntimeError("Source dataframe is not loaded.")
@@ -199,7 +307,7 @@ class KaggleDataLoader:
199
  label: str,
200
  local_index: int,
201
  ) -> Dict[str, Any]:
202
- """Convert a Kaggle row into one deterministic marketplace case."""
203
 
204
  row_id = int(row["row_id"])
205
  anomaly_strength = float(row["case_score"])
@@ -256,7 +364,7 @@ class KaggleDataLoader:
256
  )
257
 
258
  business_cost = self._business_cost(task_name, label, anomaly_strength)
259
- case = {
260
  "transaction_id": f"{task_name}_{label}_{row_id}",
261
  "label": label,
262
  "risk_score": round(base_risk, 4),
@@ -264,7 +372,6 @@ class KaggleDataLoader:
264
  "transaction_data": transaction_data,
265
  "historical_context": historical_context,
266
  }
267
- return case
268
 
269
  @staticmethod
270
  def _timestamp_from_seconds(raw_seconds: float) -> str:
@@ -529,10 +636,20 @@ class KaggleDataLoader:
529
  fraud_ring_group: int | None,
530
  flash_sale_group: int | None,
531
  ) -> Dict[str, Any]:
532
- seller_velocity = self._stable_int(1, 18 if task_name != "hard" else 35, "seller-velocity", task_name, label, row_id)
 
 
 
 
 
 
 
533
  linked_cards = self._stable_int(1, 4 if task_name == "easy" else 8, "linked-cards", task_name, label, row_id)
534
  recent_refunds = self._stable_int(0, 2 if task_name == "easy" else 6, "refunds", task_name, label, row_id)
535
- cluster_alert = round(self._clamp(0.20 + anomaly_strength * 0.55 + (0.10 if fraud_ring_group is not None else 0.0)), 3)
 
 
 
536
 
537
  note = TASK_SPECS[task_name]["focus"]
538
  if fraud_ring_group is not None:
@@ -542,6 +659,8 @@ class KaggleDataLoader:
542
 
543
  return {
544
  "task_focus": TASK_SPECS[task_name]["focus"],
 
 
545
  "seller_transactions_1h": seller_velocity,
546
  "linked_cards_7d": linked_cards,
547
  "recent_refunds_7d": recent_refunds,
@@ -555,3 +674,7 @@ class KaggleDataLoader:
555
  task_bias = {"easy": 0.00, "medium": 0.10, "hard": 0.18}[task_name]
556
  label_bias = 0.28 if label == "fraud" else 0.02
557
  return self._clamp(0.75 + task_bias + label_bias + anomaly_strength * 0.35, 0.55, 1.85)
 
 
 
 
 
1
+ """Deterministic FraudShield snapshot loader built from public fraud data."""
2
 
3
  from __future__ import annotations
4
 
5
  import hashlib
6
  import json
7
  import logging
8
+ from datetime import datetime, timedelta, timezone
9
  from pathlib import Path
10
  from typing import Any, Dict, List, Tuple
11
 
 
14
 
15
  logger = logging.getLogger(__name__)
16
 
17
+ PRIMARY_SOURCE_ID = "kaggle_creditcardfraud"
18
+ BUNDLE_SCHEMA_VERSION = "2.0"
19
+
20
+ PUBLIC_SOURCE_CATALOG: Dict[str, Dict[str, str]] = {
21
+ PRIMARY_SOURCE_ID: {
22
+ "provider": "Kaggle / ULB",
23
+ "dataset_id": "mlg-ulb/creditcardfraud",
24
+ "title": "Credit Card Fraud Detection",
25
+ "source_url": "https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud",
26
+ "license_note": "Refer to the dataset page for license and usage terms.",
27
+ }
28
+ }
29
+
30
  TASK_SPECS: Dict[str, Dict[str, Any]] = {
31
  "easy": {
32
  "per_class": 12,
 
49
  }
50
 
51
 
52
+ class FraudDataLoader:
53
+ """Loads the committed snapshot or rebuilds it from the public source CSV."""
54
 
55
  def __init__(self, data_path: str = "data", seed: int = 42):
56
  self.seed = seed
 
60
  self.bundle_file = self.data_path / "fraudshield_cases.json"
61
  self.df: pd.DataFrame | None = None
62
  self.task_bundle: Dict[str, List[Dict[str, Any]]] = {}
63
+ self.bundle_metadata: Dict[str, Any] = {}
64
+ self.source_catalog = PUBLIC_SOURCE_CATALOG.copy()
65
 
66
+ def download_source_data(self, source_id: str = PRIMARY_SOURCE_ID) -> bool:
67
+ """Download the public source dataset used to build the local snapshot."""
68
+
69
+ if source_id != PRIMARY_SOURCE_ID:
70
+ raise ValueError(f"Unsupported source_id: {source_id}")
71
 
72
  try:
73
  import kaggle
74
 
75
+ logger.info("Downloading public source dataset %s...", source_id)
76
  kaggle.api.dataset_download_files(
77
+ self.source_catalog[source_id]["dataset_id"],
78
  path=str(self.data_path),
79
  unzip=True,
80
  )
81
  logger.info("Downloaded source data to %s", self.data_path)
82
  return True
83
  except Exception as exc: # pragma: no cover - external dependency
84
+ logger.error("Failed to download source data: %s", exc)
85
  return False
86
 
87
+ def download_data(self) -> bool:
88
+ """Backward-compatible wrapper for the old method name."""
89
+
90
+ return self.download_source_data()
91
+
92
+ def load_bundle(self) -> bool:
93
+ """Load the compact snapshot, or build it from the local source CSV."""
94
 
95
  if self.bundle_file.exists():
96
+ payload = json.loads(self.bundle_file.read_text(encoding="utf-8"))
97
+ self.task_bundle = payload["tasks"]
98
+ self.bundle_metadata = self._normalize_bundle_metadata(payload)
99
+ logger.info(
100
+ "Loaded FraudShield snapshot %s from %s",
101
+ self.bundle_metadata.get("snapshot_id", "unknown"),
102
+ self.bundle_file,
103
+ )
104
  return True
105
 
106
  if not self.csv_file.exists():
107
  logger.error("Neither %s nor %s is available.", self.bundle_file, self.csv_file)
108
  return False
109
 
110
+ logger.info("Building FraudShield snapshot from %s", self.csv_file)
111
  self.df = pd.read_csv(self.csv_file)
112
  self.task_bundle = self._build_task_bundle()
113
+ payload = self._build_bundle_payload()
114
+ self.bundle_metadata = payload["metadata"]
 
 
 
115
  self.bundle_file.write_text(json.dumps(payload, indent=2), encoding="utf-8")
116
+ logger.info("Wrote deterministic FraudShield snapshot to %s", self.bundle_file)
117
  return True
118
 
119
+ def load_data(self) -> bool:
120
+ """Backward-compatible wrapper for the old method name."""
121
+
122
+ return self.load_bundle()
123
+
124
+ def get_bundle_summary(self) -> Dict[str, Any]:
125
+ """Return source and snapshot metadata for docs, APIs, and evals."""
126
+
127
+ if not self.bundle_metadata:
128
+ return {}
129
+
130
+ sources = self.bundle_metadata.get("sources", [])
131
+ return {
132
+ "snapshot_id": self.bundle_metadata.get("snapshot_id"),
133
+ "schema_version": self.bundle_metadata.get("schema_version"),
134
+ "generated_at": self.bundle_metadata.get("generated_at"),
135
+ "seed": self.bundle_metadata.get("seed", self.seed),
136
+ "source_count": len(sources),
137
+ "sources": [
138
+ {
139
+ "source_id": source.get("source_id"),
140
+ "provider": source.get("provider"),
141
+ "title": source.get("title"),
142
+ "dataset_id": source.get("dataset_id"),
143
+ "source_url": source.get("source_url"),
144
+ }
145
+ for source in sources
146
+ ],
147
+ "task_sizes": {task_name: len(cases) for task_name, cases in self.task_bundle.items()},
148
+ }
149
+
150
  def get_task_cases(self, task: str) -> List[Dict[str, Any]]:
151
  """Return the full case records for a given task."""
152
 
153
  if not self.task_bundle:
154
+ raise RuntimeError("Data not loaded. Call load_bundle() first.")
155
  if task not in self.task_bundle:
156
  raise ValueError(f"Unknown task: {task}")
157
  return list(self.task_bundle[task])
 
173
  [case["label"] for case in hard_cases],
174
  )
175
 
176
+ def _normalize_bundle_metadata(self, payload: Dict[str, Any]) -> Dict[str, Any]:
177
+ """Support both the original bundle shape and the new metadata-rich one."""
178
+
179
+ metadata = payload.get("metadata")
180
+ if metadata:
181
+ return metadata
182
+
183
+ source_name = payload.get("source", "Public fraud source")
184
+ return {
185
+ "snapshot_id": "fraudshield-realworld-v1",
186
+ "schema_version": "1.0",
187
+ "generated_at": None,
188
+ "seed": payload.get("seed", self.seed),
189
+ "sources": [
190
+ {
191
+ "source_id": PRIMARY_SOURCE_ID,
192
+ "provider": self.source_catalog[PRIMARY_SOURCE_ID]["provider"],
193
+ "title": source_name,
194
+ "dataset_id": self.source_catalog[PRIMARY_SOURCE_ID]["dataset_id"],
195
+ "source_url": self.source_catalog[PRIMARY_SOURCE_ID]["source_url"],
196
+ }
197
+ ],
198
+ }
199
+
200
+ def _build_bundle_payload(self) -> Dict[str, Any]:
201
+ """Build the full snapshot payload written to disk."""
202
+
203
+ sources = [
204
+ {
205
+ "source_id": source_id,
206
+ **details,
207
+ }
208
+ for source_id, details in self.source_catalog.items()
209
+ if source_id == PRIMARY_SOURCE_ID
210
+ ]
211
+ metadata = {
212
+ "snapshot_id": "fraudshield-realworld-v2",
213
+ "schema_version": BUNDLE_SCHEMA_VERSION,
214
+ "generated_at": datetime.now(timezone.utc).isoformat(),
215
+ "seed": self.seed,
216
+ "build_notes": (
217
+ "Runtime uses this frozen snapshot only. Public source downloads are optional and "
218
+ "intended for rebuilding the snapshot offline."
219
+ ),
220
+ "sources": sources,
221
+ }
222
+ return {
223
+ "metadata": metadata,
224
+ "tasks": self.task_bundle,
225
+ }
226
+
227
  def _build_task_bundle(self) -> Dict[str, List[Dict[str, Any]]]:
228
+ """Create deterministic tasks from the public source dataset."""
229
 
230
  if self.df is None:
231
  raise RuntimeError("Source dataframe is not loaded.")
 
307
  label: str,
308
  local_index: int,
309
  ) -> Dict[str, Any]:
310
+ """Convert a source row into one deterministic marketplace case."""
311
 
312
  row_id = int(row["row_id"])
313
  anomaly_strength = float(row["case_score"])
 
364
  )
365
 
366
  business_cost = self._business_cost(task_name, label, anomaly_strength)
367
+ return {
368
  "transaction_id": f"{task_name}_{label}_{row_id}",
369
  "label": label,
370
  "risk_score": round(base_risk, 4),
 
372
  "transaction_data": transaction_data,
373
  "historical_context": historical_context,
374
  }
 
375
 
376
  @staticmethod
377
  def _timestamp_from_seconds(raw_seconds: float) -> str:
 
636
  fraud_ring_group: int | None,
637
  flash_sale_group: int | None,
638
  ) -> Dict[str, Any]:
639
+ seller_velocity = self._stable_int(
640
+ 1,
641
+ 18 if task_name != "hard" else 35,
642
+ "seller-velocity",
643
+ task_name,
644
+ label,
645
+ row_id,
646
+ )
647
  linked_cards = self._stable_int(1, 4 if task_name == "easy" else 8, "linked-cards", task_name, label, row_id)
648
  recent_refunds = self._stable_int(0, 2 if task_name == "easy" else 6, "refunds", task_name, label, row_id)
649
+ cluster_alert = round(
650
+ self._clamp(0.20 + anomaly_strength * 0.55 + (0.10 if fraud_ring_group is not None else 0.0)),
651
+ 3,
652
+ )
653
 
654
  note = TASK_SPECS[task_name]["focus"]
655
  if fraud_ring_group is not None:
 
659
 
660
  return {
661
  "task_focus": TASK_SPECS[task_name]["focus"],
662
+ "snapshot_id": self.bundle_metadata.get("snapshot_id") if self.bundle_metadata else "fraudshield-realworld-v1",
663
+ "source_id": PRIMARY_SOURCE_ID,
664
  "seller_transactions_1h": seller_velocity,
665
  "linked_cards_7d": linked_cards,
666
  "recent_refunds_7d": recent_refunds,
 
674
  task_bias = {"easy": 0.00, "medium": 0.10, "hard": 0.18}[task_name]
675
  label_bias = 0.28 if label == "fraud" else 0.02
676
  return self._clamp(0.75 + task_bias + label_bias + anomaly_strength * 0.35, 0.55, 1.85)
677
+
678
+
679
+ # Backward-compatible alias to avoid breaking older imports.
680
+ KaggleDataLoader = FraudDataLoader
download_kaggle_data.py CHANGED
@@ -1,5 +1,5 @@
1
  #!/usr/bin/env python3
2
- """Download the Kaggle source CSV and regenerate the compact FraudShield bundle."""
3
 
4
  from __future__ import annotations
5
 
@@ -7,14 +7,14 @@ import logging
7
  import sys
8
  from pathlib import Path
9
 
10
- from data_loader import KaggleDataLoader
11
 
12
  logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
13
  logger = logging.getLogger(__name__)
14
 
15
 
16
  def check_kaggle_setup() -> bool:
17
- """Validate that the local Kaggle token exists."""
18
 
19
  kaggle_json = Path.home() / ".kaggle" / "kaggle.json"
20
  if kaggle_json.exists():
@@ -22,27 +22,31 @@ def check_kaggle_setup() -> bool:
22
  return True
23
 
24
  logger.error("kaggle.json was not found at %s", kaggle_json)
25
- logger.error("Create a Kaggle API token and place it there before running this script.")
26
  return False
27
 
28
 
29
  def main() -> int:
30
- """Download or refresh the source dataset, then rebuild the task bundle."""
31
 
32
- if not check_kaggle_setup():
33
- return 1
34
 
35
- loader = KaggleDataLoader(data_path="data", seed=42)
36
- if not loader.download_data():
37
- return 1
 
 
 
 
38
 
39
  if loader.bundle_file.exists():
40
  loader.bundle_file.unlink()
41
 
42
- if not loader.load_data():
43
  return 1
44
 
45
- logger.info("Task bundle is ready at %s", loader.bundle_file)
 
46
  return 0
47
 
48
 
 
1
  #!/usr/bin/env python3
2
+ """Refresh the local FraudShield snapshot from the public source dataset."""
3
 
4
  from __future__ import annotations
5
 
 
7
  import sys
8
  from pathlib import Path
9
 
10
+ from data_loader import FraudDataLoader
11
 
12
  logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
13
  logger = logging.getLogger(__name__)
14
 
15
 
16
  def check_kaggle_setup() -> bool:
17
+ """Validate that the local Kaggle token exists when a download is needed."""
18
 
19
  kaggle_json = Path.home() / ".kaggle" / "kaggle.json"
20
  if kaggle_json.exists():
 
22
  return True
23
 
24
  logger.error("kaggle.json was not found at %s", kaggle_json)
25
+ logger.error("Create a Kaggle API token and place it there before downloading source data.")
26
  return False
27
 
28
 
29
  def main() -> int:
30
+ """Rebuild the FraudShield snapshot from the local CSV or download it first if missing."""
31
 
32
+ loader = FraudDataLoader(data_path="data", seed=42)
 
33
 
34
+ if not loader.csv_file.exists():
35
+ if not check_kaggle_setup():
36
+ return 1
37
+ if not loader.download_source_data():
38
+ return 1
39
+ else:
40
+ logger.info("Found existing public source CSV at %s", loader.csv_file)
41
 
42
  if loader.bundle_file.exists():
43
  loader.bundle_file.unlink()
44
 
45
+ if not loader.load_bundle():
46
  return 1
47
 
48
+ logger.info("Snapshot summary: %s", loader.get_bundle_summary())
49
+ logger.info("FraudShield snapshot is ready at %s", loader.bundle_file)
50
  return 0
51
 
52
 
fraudshield_env.py CHANGED
@@ -6,7 +6,7 @@ import uuid
6
  from datetime import datetime
7
  from typing import Any, Dict, List
8
 
9
- from data_loader import KaggleDataLoader
10
  from models import (
11
  DecisionEnum,
12
  EpisodeState,
@@ -25,7 +25,7 @@ class FraudShieldEnvironment:
25
 
26
  def __init__(self, data_path: str = "data", seed: int = 42):
27
  self.seed = seed
28
- self.data_loader = KaggleDataLoader(data_path=data_path, seed=seed)
29
  self.data_loaded = False
30
 
31
  self.episode_id = ""
@@ -47,16 +47,21 @@ class FraudShieldEnvironment:
47
  TaskDifficulty.HARD: 48,
48
  }
49
 
50
- def load_kaggle_data(self) -> bool:
51
- """Load the bundled cases or build them from the local CSV."""
52
 
53
  self.data_loaded = self.data_loader.load_data()
54
  return self.data_loaded
55
 
 
 
 
 
 
56
  def ensure_data_loaded(self) -> None:
57
  """Load data on demand so server startup can stay simple."""
58
 
59
- if not self.data_loaded and not self.load_kaggle_data():
60
  raise RuntimeError("FraudShield data bundle could not be loaded.")
61
 
62
  def reset(self, task: str = "easy") -> ResetResult:
@@ -83,6 +88,7 @@ class FraudShieldEnvironment:
83
  "episode_id": self.episode_id,
84
  "task": task,
85
  "task_focus": observation.historical_context.get("task_focus") if observation.historical_context else None,
 
86
  "max_steps": self.max_steps[self.current_task],
87
  "num_transactions": len(self.current_cases),
88
  "fraud_count": sum(1 for label in self.ground_truth_labels if label == "fraud"),
 
6
  from datetime import datetime
7
  from typing import Any, Dict, List
8
 
9
+ from data_loader import FraudDataLoader
10
  from models import (
11
  DecisionEnum,
12
  EpisodeState,
 
25
 
26
  def __init__(self, data_path: str = "data", seed: int = 42):
27
  self.seed = seed
28
+ self.data_loader = FraudDataLoader(data_path=data_path, seed=seed)
29
  self.data_loaded = False
30
 
31
  self.episode_id = ""
 
47
  TaskDifficulty.HARD: 48,
48
  }
49
 
50
+ def load_data(self) -> bool:
51
+ """Load the committed snapshot or rebuild it from the local public source CSV."""
52
 
53
  self.data_loaded = self.data_loader.load_data()
54
  return self.data_loaded
55
 
56
+ def load_kaggle_data(self) -> bool:
57
+ """Backward-compatible wrapper for the previous method name."""
58
+
59
+ return self.load_data()
60
+
61
  def ensure_data_loaded(self) -> None:
62
  """Load data on demand so server startup can stay simple."""
63
 
64
+ if not self.data_loaded and not self.load_data():
65
  raise RuntimeError("FraudShield data bundle could not be loaded.")
66
 
67
  def reset(self, task: str = "easy") -> ResetResult:
 
88
  "episode_id": self.episode_id,
89
  "task": task,
90
  "task_focus": observation.historical_context.get("task_focus") if observation.historical_context else None,
91
+ "data_snapshot": self.data_loader.get_bundle_summary(),
92
  "max_steps": self.max_steps[self.current_task],
93
  "num_transactions": len(self.current_cases),
94
  "fraud_count": sum(1 for label in self.ground_truth_labels if label == "fraud"),
inference.py CHANGED
@@ -67,7 +67,7 @@ def main() -> Dict[str, object]:
67
  logger.info("%s", "=" * 72)
68
 
69
  env = FraudShieldEnvironment(data_path="data", seed=42)
70
- if not env.load_kaggle_data():
71
  logger.error("FraudShield data could not be loaded from ./data")
72
  sys.exit(1)
73
 
@@ -99,6 +99,7 @@ def main() -> Dict[str, object]:
99
  "api_base_url": os.getenv("API_BASE_URL", "https://router.huggingface.co/v1"),
100
  "model_name": os.getenv("MODEL_NAME", ""),
101
  "seed": 42,
 
102
  "tasks": {
103
  "easy": len(easy_ground_truth),
104
  "medium": len(medium_ground_truth),
 
67
  logger.info("%s", "=" * 72)
68
 
69
  env = FraudShieldEnvironment(data_path="data", seed=42)
70
+ if not env.load_data():
71
  logger.error("FraudShield data could not be loaded from ./data")
72
  sys.exit(1)
73
 
 
99
  "api_base_url": os.getenv("API_BASE_URL", "https://router.huggingface.co/v1"),
100
  "model_name": os.getenv("MODEL_NAME", ""),
101
  "seed": 42,
102
+ "data_snapshot": env.data_loader.get_bundle_summary(),
103
  "tasks": {
104
  "easy": len(easy_ground_truth),
105
  "medium": len(medium_ground_truth),
openenv.yaml CHANGED
@@ -1,6 +1,6 @@
1
  name: fraudshield
2
  version: 0.2.0
3
- description: OpenEnv environment for marketplace fraud review built from curated Kaggle transactions.
4
  author: Devika J
5
 
6
  environment:
 
1
  name: fraudshield
2
  version: 0.2.0
3
+ description: OpenEnv environment for marketplace fraud review built from a frozen public-data snapshot.
4
  author: Devika J
5
 
6
  environment:
pyproject.toml CHANGED
@@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta"
5
  [project]
6
  name = "fraudshield"
7
  version = "0.2.0"
8
- description = "OpenEnv fraud review environment built from curated Kaggle transactions"
9
  readme = "README.md"
10
  requires-python = ">=3.10"
11
  license = { text = "MIT" }
 
5
  [project]
6
  name = "fraudshield"
7
  version = "0.2.0"
8
+ description = "OpenEnv fraud review environment built from a frozen public-data snapshot"
9
  readme = "README.md"
10
  requires-python = ">=3.10"
11
  license = { text = "MIT" }
server/app.py CHANGED
@@ -25,7 +25,7 @@ env = FraudShieldEnvironment(data_path=str(DATA_PATH), seed=42)
25
  async def lifespan(_: FastAPI):
26
  """Load the bundled task set when the API process starts."""
27
 
28
- if not env.load_kaggle_data():
29
  logger.error("FraudShield failed to load its bundled data from %s", DATA_PATH)
30
  yield
31
 
@@ -43,7 +43,7 @@ async def health_check() -> Dict[str, Any]:
43
  """Container health probe."""
44
 
45
  if not env.data_loaded:
46
- env.load_kaggle_data()
47
 
48
  return {
49
  "status": "healthy" if env.data_loaded else "degraded",
@@ -103,12 +103,13 @@ async def get_info() -> Dict[str, Any]:
103
  return {
104
  "name": "fraudshield",
105
  "version": "0.2.0",
106
- "description": "E-commerce fraud review environment built from curated Kaggle cases.",
107
  "tasks": {
108
  task.value: {"max_steps": max_steps}
109
  for task, max_steps in env.max_steps.items()
110
  },
111
  "data_path": str(DATA_PATH),
 
112
  }
113
 
114
 
 
25
  async def lifespan(_: FastAPI):
26
  """Load the bundled task set when the API process starts."""
27
 
28
+ if not env.load_data():
29
  logger.error("FraudShield failed to load its bundled data from %s", DATA_PATH)
30
  yield
31
 
 
43
  """Container health probe."""
44
 
45
  if not env.data_loaded:
46
+ env.load_data()
47
 
48
  return {
49
  "status": "healthy" if env.data_loaded else "degraded",
 
103
  return {
104
  "name": "fraudshield",
105
  "version": "0.2.0",
106
+ "description": "E-commerce fraud review environment built from a frozen public-data snapshot.",
107
  "tasks": {
108
  task.value: {"max_steps": max_steps}
109
  for task, max_steps in env.max_steps.items()
110
  },
111
  "data_path": str(DATA_PATH),
112
+ "data_snapshot": env.data_loader.get_bundle_summary(),
113
  }
114
 
115