zirobtc commited on
Commit
2c39730
·
verified ·
1 Parent(s): b441d51

Upload folder using huggingface_hub

Browse files
QUALITY_SCORE_ARCHITECTURE.md CHANGED
@@ -1,165 +1,13 @@
1
- # Token Quality / Health Score (q) - Architecture
2
 
3
- This document defines the "quality/health" scalar `q` used by Apollo.
4
 
5
- ## 1) What problem this solves
6
 
7
- We want a single number that captures **how healthy / organic vs controlled** a token looks, so a downstream trading policy (e.g., RL agent) can treat it as a **risk/health input**.
 
 
 
 
8
 
9
- Key points:
10
- - This is **not model confidence**.
11
- - `q` is computed **offline** using a token's **full lifetime** (for labels / training targets).
12
- - At **inference**, the model predicts `q` from **partial observations**.
13
- - We avoid hard thresholds and raw-scale features (USD, SOL, counts) by using **within-regime distributions**.
14
-
15
- ## 2) Core idea (distribution-first, not rules-first)
16
-
17
- Raw totals (fees, volume, holders) are mostly **scale** and are extremely heavy-tailed. Using them directly:
18
- - makes the signal unstable across regimes,
19
- - makes it sensitive to market-wide shifts,
20
- - and invites hand-tuned weights ("human bias").
21
-
22
- Instead we map each metric to a **percentile** within a comparable peer group, then aggregate.
23
-
24
- ## 3) Return bucketing (why it is required)
25
-
26
- The dataset is highly imbalanced: most tokens die early (<2-3x), while a tiny tail produces 10x-1000x outcomes.
27
-
28
- If you compute percentiles globally:
29
- - 100x tokens will always dominate "good" percentiles for scale metrics,
30
- - and "quality" will collapse into "return magnitude".
31
-
32
- So we compute distributions **within return regimes**.
33
-
34
- ### 3.1 Bucket definition (example)
35
-
36
- Let `R_max` be the token's lifetime max return multiple (e.g., ATH / launch).
37
-
38
- Use coarse buckets for the bulk and finer buckets for the tail, e.g.:
39
- - B0: `R_max < 3`
40
- - B1: `3 <= R_max < 5`
41
- - B2: `5 <= R_max < 10`
42
- - B3: `10 <= R_max < 20`
43
- - B4: `20 <= R_max < 100`
44
- - B5: `100 <= R_max < 10_000`
45
-
46
- Notes:
47
- - If a bucket has too few samples, merge with a neighbor.
48
- - For the extreme tail you can also replace fixed buckets with **quantile buckets** on `log(R_max)` to keep sample counts stable.
49
-
50
- Interpretation (important):
51
- - `q` is **relative within the bucket**.
52
- - The "best garbage" can have high `q` in B0.
53
- - A 100x token can have low `q` in B4 if it looks worst vs other 100x+ tokens.
54
-
55
- This is intentional: return and quality are different axes.
56
-
57
- ## 4) Feature set and sign conventions
58
-
59
- We want `q` to increase for "healthy/organic" structure and decrease for "controlled/manipulated" structure.
60
-
61
- All features below are evaluated **within the token's return bucket**.
62
-
63
- ### 4.1 Scale / activity (high is usually better within-bucket)
64
-
65
- Use log transforms for stability before percentiles:
66
- - `log1p(total_volume_usd)`
67
- - `log1p(total_fees_sol)`
68
- - `log1p(unique_holders)`
69
- - `log1p(time_to_ath_sec)` (optional; see note below)
70
-
71
- Ratio features (less pure scale):
72
- - `fees_per_volume = total_fees_sol / (total_volume_usd + eps)`
73
- - `fees_per_trade = total_fees_sol / (n_trades + eps)` (if `n_trades` exists)
74
- - `holders_per_trade = unique_holders / (n_trades + eps)` (if `n_trades` exists)
75
- - `holders_per_volume = unique_holders / (total_volume_usd + eps)`
76
-
77
- Rationale:
78
- - Fees and fee-per-* help separate "real urgency / competition" from "cheap wash".
79
- - Holders and holders-per-* help separate broad participation from concentrated looping.
80
-
81
- ### 4.2 Manipulation / control (high is worse; flip sign)
82
-
83
- These are typically "the higher, the less healthy":
84
- - `snipers_pct_supply_top70`
85
- - `bundled_pct_supply`
86
- - `dev_hold_pct_supply`
87
- - `insiders_pct_supply`
88
-
89
- We treat exceptions as rare; the model can learn edge cases from context, but the label should reflect the dominant interpretation.
90
-
91
- ### 4.3 Time-to-ATH note
92
-
93
- `time_to_ath_sec` can behave differently across return buckets.
94
- - In high-return buckets, very short times can look like a single spike / control.
95
- - In low-return buckets, many tokens have near-zero times because they never move.
96
-
97
- Include it only if it improves downstream behavior; keep it **bucket-relative** either way.
98
-
99
- ## 5) Turning raw metrics into a signed scalar
100
-
101
- We want a single `q` in `[-1, +1]` with direction:
102
- - `+1` = looks healthiest vs peers in the same return bucket
103
- - `-1` = looks most unhealthy vs peers in the same return bucket
104
-
105
- ### 5.1 Within-bucket percentile (ECDF)
106
-
107
- For each feature value `x_i`:
108
- - compute percentile `p_i = ECDF_b(x_i)` using only tokens in bucket `b`
109
- - `p_i` is in `[0, 1]`
110
-
111
- Implementation detail:
112
- - Use a rank-based ECDF with a small offset to avoid exact 0/1 if desired:
113
- - `p_i = (rank(x_i) - 0.5) / n`
114
-
115
- ### 5.2 Signed percentile
116
-
117
- Convert to signed value:
118
- - `s_i = 2 * p_i - 1` (now `s_i` is in `[-1, +1]`)
119
-
120
- If "high is bad" for that feature, flip it:
121
- - `s_i := -s_i`
122
-
123
- This gives direction + magnitude in a single number.
124
-
125
- ### 5.3 Aggregate without hand weights
126
-
127
- To avoid hand-tuned weights, use a symmetric aggregator:
128
- - `q_raw = mean_i(s_i)`
129
-
130
- Optional robustness:
131
- - clip each `s_i` to `[-0.99, 0.99]` before averaging (limits extreme leverage)
132
- - use a trimmed mean (drop top/bottom k% of `s_i`) if a single metric can be noisy
133
-
134
- ### 5.4 Optional: re-rank the aggregate (final calibration)
135
-
136
- If you want the final `q` to be strictly comparable across time / retrains and more uniform within bucket:
137
- - `q = 2 * ECDF_b(q_raw) - 1`
138
-
139
- This keeps the "relative within bucket" meaning while stabilizing scale.
140
-
141
- ## 6) Training vs inference (how it is used)
142
-
143
- Offline labeling (training target):
144
- 1) Compute `R_max` from full lifetime.
145
- 2) Assign return bucket `b`.
146
- 3) Compute all chosen metrics from full lifetime.
147
- 4) Convert metrics -> signed percentiles -> `q`.
148
-
149
- Inference (model output):
150
- - The model only sees partial history and must predict the *final* `q` (computed above).
151
- - The trading policy uses predicted return signals + predicted `q` to decide position sizing / risk.
152
-
153
- ## 7) Practical notes
154
-
155
- - Use `eps` (e.g., `1e-9`) in denominators to avoid divide-by-zero.
156
- - If a metric is missing for a token, drop it from the mean for that token (or impute with bucket median).
157
- - When bucket sample counts drift, prefer merging buckets rather than letting ECDF be noisy.
158
- - Recompute distributions on the same "source-of-truth" dataset used for training (not ad-hoc caches).
159
-
160
- ## 8) Summary
161
-
162
- `q` is a **return-regime-relative**, **distribution-normalized**, **signed** health score:
163
- - It is not a threshold classifier.
164
- - It avoids raw-scale dependence and hand weighting.
165
- - It cleanly separates "made money" (return) from "looks healthy" (quality).
 
1
+ OK I think I see the real issue now.
2
 
3
+ The weighted sampling balances which tokens the model sees equally. But the labels (the actual return values) are determined by the random T_cutoff within each token, not the token's class.
4
 
5
+ Even a class 5 token (100x return) only pumps in a tiny window of its lifetime. If you have 1000 trades and the pump happens between trade 200-400, then:
6
 
7
+ T_cutoff at trade 50 returns might be +500%
8
+ T_cutoff at trade 500 → returns are -80% (post-pump bleed)
9
+ T_cutoff at trade 700 → returns are -90%
10
+ T_cutoff at trade 900 → returns are -95%
11
+ So even for class 5 tokens, 80%+ of the cached training samples have negative Ground Truth labels. The model is correctly learning that at any random moment, even a "good" token is most likely going down. The class balancing doesn't change the fact that the actual Y labels are overwhelmingly negative across all classes.
12
 
13
+ The model isn't broken — it learned exactly what the data showed it. The issue is that the training setup doesn't teach it to recognize the pre-pump moment specifically.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
access/masking_policies.list ADDED
Binary file (1 Bytes). View file
 
access/quotas.list ADDED
Binary file (1 Bytes). View file
 
access/roles.list ADDED
Binary file (1 Bytes). View file
 
access/row_policies.list ADDED
Binary file (1 Bytes). View file
 
access/settings_profiles.list ADDED
Binary file (1 Bytes). View file
 
access/users.list ADDED
Binary file (1 Bytes). View file
 
data/data_loader.py CHANGED
@@ -124,7 +124,6 @@ class OracleDataset(Dataset):
124
  max_samples: Optional[int] = None,
125
 
126
  token_allowlist: Optional[List[str]] = None,
127
- t_cutoff_seconds: int = 60,
128
  cache_dir: Optional[Union[str, Path]] = None,
129
  start_date: Optional[datetime.datetime] = None,
130
  min_trade_usd: float = 0.0,
@@ -161,10 +160,7 @@ class OracleDataset(Dataset):
161
 
162
 
163
 
164
- # If a fetcher is provided, we can determine the number of samples.
165
- # Otherwise, we are likely in a test mode where __len__ might not be called
166
- # or is used with a mock length.
167
- self.t_cutoff_seconds = max(0, int(t_cutoff_seconds or 0))
168
  self.token_allowlist = set(token_allowlist) if token_allowlist else None
169
 
170
  if self.cache_dir:
@@ -2606,7 +2602,7 @@ class OracleDataset(Dataset):
2606
  pooler.pool_map[key] = {'item': emb.cpu().clone(), 'idx': old_entry['idx']}
2607
 
2608
 
2609
- def __cacheitem_context__(self, idx: int, num_samples_per_token: int = 1, encoder: Optional[Any] = None) -> List[Optional[Dict[str, Any]]]:
2610
  """
2611
  Generates fully processed training contexts for caching.
2612
 
@@ -2957,7 +2953,14 @@ class OracleDataset(Dataset):
2957
  results = []
2958
 
2959
  # Sample indices (with replacement if needed)
2960
- if num_samples_per_token >= len(eligible_indices):
 
 
 
 
 
 
 
2961
  sampled_indices = eligible_indices.copy()
2962
  else:
2963
  sampled_indices = random.sample(eligible_indices, num_samples_per_token)
 
124
  max_samples: Optional[int] = None,
125
 
126
  token_allowlist: Optional[List[str]] = None,
 
127
  cache_dir: Optional[Union[str, Path]] = None,
128
  start_date: Optional[datetime.datetime] = None,
129
  min_trade_usd: float = 0.0,
 
160
 
161
 
162
 
163
+
 
 
 
164
  self.token_allowlist = set(token_allowlist) if token_allowlist else None
165
 
166
  if self.cache_dir:
 
2602
  pooler.pool_map[key] = {'item': emb.cpu().clone(), 'idx': old_entry['idx']}
2603
 
2604
 
2605
+ def __cacheitem_context__(self, idx: int, num_samples_per_token: int = 1, encoder: Optional[Any] = None, forced_cutoff_trade_idx: Optional[int] = None) -> List[Optional[Dict[str, Any]]]:
2606
  """
2607
  Generates fully processed training contexts for caching.
2608
 
 
2953
  results = []
2954
 
2955
  # Sample indices (with replacement if needed)
2956
+ if forced_cutoff_trade_idx is not None:
2957
+ # Forced mode: use the exact trade index provided (for evaluation)
2958
+ if forced_cutoff_trade_idx >= len(all_trades_sorted):
2959
+ print(f" WARN: forced_cutoff_trade_idx={forced_cutoff_trade_idx} >= total trades {len(all_trades_sorted)}, clamping.")
2960
+ forced_cutoff_trade_idx = len(all_trades_sorted) - 2
2961
+ sampled_indices = [forced_cutoff_trade_idx]
2962
+ print(f" Using forced T_cutoff at trade index {forced_cutoff_trade_idx}")
2963
+ elif num_samples_per_token >= len(eligible_indices):
2964
  sampled_indices = eligible_indices.copy()
2965
  else:
2966
  sampled_indices = random.sample(eligible_indices, num_samples_per_token)
ingest.sh CHANGED
@@ -20,7 +20,7 @@ error() { echo -e "${RED}[ERROR]${NC} $1"; exit 1; }
20
  #===============================================================================
21
  header "Step 5-6/7: Processing Epochs (Download → Ingest → Delete)"
22
 
23
- EPOCHS=(844 845 846 847 848 849 850 851)
24
 
25
 
26
  log "Processing epochs one at a time to minimize disk usage..."
@@ -39,9 +39,9 @@ for epoch in "${EPOCHS[@]}"; do
39
  error "Failed to download epoch ${epoch}. Cannot continue."
40
  }
41
 
42
- # Step 2: Ingest (always pass --merge-neo4j; auto-detect handles empty DB)
43
- log " [2/3] Ingesting epoch ${epoch} into databases..."
44
- python scripts/ingest_epoch.py --epoch "$epoch" --merge-neo4j || {
45
  error "Ingestion failed for epoch ${epoch}. Cannot continue."
46
  }
47
 
 
20
  #===============================================================================
21
  header "Step 5-6/7: Processing Epochs (Download → Ingest → Delete)"
22
 
23
+ EPOCHS=(852 853)
24
 
25
 
26
  log "Processing epochs one at a time to minimize disk usage..."
 
39
  error "Failed to download epoch ${epoch}. Cannot continue."
40
  }
41
 
42
+ # Step 2: Ingest (ClickHouse only)
43
+ log " [2/3] Ingesting epoch ${epoch} into ClickHouse database..."
44
+ python scripts/ingest_epoch.py --epoch "$epoch" --skip-neo4j || {
45
  error "Ingestion failed for epoch ${epoch}. Cannot continue."
46
  }
47
 
log.log CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:656c6818f224b26869b5d0ae10f6b522ff7eb5c7b1b3aeb59b34c3db218338a9
3
- size 11360
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11ac013f8e91ad65475b8106a5a072dc42f67e0773ddc4a50825e316c578e0d4
3
+ size 3472
models/multi_modal_processor.py CHANGED
@@ -81,8 +81,9 @@ class MultiModalEncoder:
81
  is_text = isinstance(x[0], str)
82
 
83
  autocast_dtype = self.dtype if self.dtype in [torch.float16, torch.bfloat16] else None
 
84
 
85
- with torch.autocast(device_type=self.device, dtype=autocast_dtype, enabled=(autocast_dtype is not None)):
86
  try:
87
  if is_text:
88
  inputs = self.processor(text=x, return_tensors="pt", padding=True, truncation=True).to(self.device)
 
81
  is_text = isinstance(x[0], str)
82
 
83
  autocast_dtype = self.dtype if self.dtype in [torch.float16, torch.bfloat16] else None
84
+ device_str = self.device.type if isinstance(self.device, torch.device) else self.device
85
 
86
+ with torch.autocast(device_type=device_str, dtype=autocast_dtype, enabled=(autocast_dtype is not None)):
87
  try:
88
  if is_text:
89
  inputs = self.processor(text=x, return_tensors="pt", padding=True, truncation=True).to(self.device)
scripts/evaluate_sample.py CHANGED
@@ -23,6 +23,11 @@ from models.ohlc_embedder import OHLCEmbedder
23
  from models.model import Oracle
24
  import models.vocabulary as vocab
25
  from train import create_balanced_split
 
 
 
 
 
26
 
27
  def unlog_transform(tensor):
28
  """Invert the log1p transform applied during training."""
@@ -32,13 +37,13 @@ def unlog_transform(tensor):
32
  def parse_args():
33
  parser = argparse.ArgumentParser()
34
  parser.add_argument("--checkpoint", type=str, default="checkpoints/checkpoint-90000", help="Path to checkpoint dir")
35
- parser.add_argument("--cache_dir", type=str, default="/workspace/apollo/data/cache", help="Path to dataset cache")
36
- parser.add_argument("--sample_idx", type=int, default=None, help="Specific sample index to evaluate")
37
  parser.add_argument("--mixed_precision", type=str, default="bf16")
38
  parser.add_argument("--horizons_seconds", type=int, nargs="+", default=[300, 900, 1800, 3600, 7200])
39
  parser.add_argument("--quantiles", type=float, nargs="+", default=[0.1, 0.5, 0.9])
40
- parser.add_argument("--seed", type=int, default=42)
41
- parser.add_argument("--min_horizon", type=int, default=900, help="Ensure the sampled coin has ground truth for at least this horizon (in seconds)")
 
42
  return parser.parse_args()
43
 
44
  def get_latest_checkpoint(checkpoint_dir):
@@ -52,6 +57,7 @@ def get_latest_checkpoint(checkpoint_dir):
52
  return None
53
 
54
  def main():
 
55
  args = parse_args()
56
 
57
  accelerator = Accelerator(mixed_precision=args.mixed_precision)
@@ -63,76 +69,49 @@ def main():
63
  elif accelerator.mixed_precision == 'fp16':
64
  init_dtype = torch.float16
65
 
66
- print(f"Loading cached dataset from {args.cache_dir}...")
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  dataset = OracleDataset(
68
- data_fetcher=None,
69
  fetcher_config=None,
70
  horizons_seconds=args.horizons_seconds,
71
  quantiles=args.quantiles,
72
- max_samples=None,
73
- t_cutoff_seconds=60,
74
- cache_dir=args.cache_dir
75
  )
76
 
77
- if len(dataset) == 0:
78
- raise ValueError("Dataset is empty!")
79
-
80
- # Optionally pick validation sample like in training
81
- print("Creating balanced train/val split to pick a validation sample...")
82
- _, val_indices, _ = create_balanced_split(dataset, n_val_per_class=10, seed=args.seed)
83
-
84
- # Re-seed with system time so we don't pick the same sample every time
85
- import time
86
- random.seed(time.time())
87
-
88
- # --- Filter by minimum horizon if requested ---
89
- if args.min_horizon is not None and args.min_horizon in args.horizons_seconds:
90
- print(f"Filtering dataset to find samples with ground truth >= {args.min_horizon}s...")
91
- h_idx = args.horizons_seconds.index(args.min_horizon)
92
- num_quantiles = len(args.quantiles)
93
-
94
- valid_indices = []
95
- # We search through a shuffled subset to avoid checking the whole dataset
96
- search_pool = val_indices.copy()
97
- random.shuffle(search_pool)
98
- if not search_pool:
99
- search_pool = list(range(len(dataset)))
100
- random.shuffle(search_pool)
101
-
102
- for idx in search_pool:
103
- sample = dataset[idx]
104
- if sample is None:
105
- continue
106
- mask = sample.get('labels_mask')
107
- if mask is not None:
108
- # Based on raw file inspection, mask is shape [H], so we index by h_idx directly
109
- if h_idx < len(mask) and mask[h_idx] > 0.0:
110
- valid_indices.append(idx)
111
- # Once we find a handful of valid ones, we can stop searching
112
- if len(valid_indices) >= 10:
113
- break
114
-
115
- if valid_indices:
116
- print(f"Found {len(valid_indices)} candidate samples with >= {args.min_horizon}s horizon.")
117
- val_indices = valid_indices
118
- else:
119
- print(f"WARNING: No samples found with ground truth for horizon {args.min_horizon}s. Reverting to random pick.")
120
 
121
- if args.sample_idx is not None:
122
- if args.sample_idx >= len(dataset):
123
- raise ValueError(f"Sample index {args.sample_idx} out of range [0, {len(dataset)-1}]")
124
- sample_idx = args.sample_idx
125
- else:
126
- # Pick a random sample from validation set
127
- if len(val_indices) > 0:
128
- sample_idx = random.choice(val_indices)
129
- else:
130
- print("No validation indices found. Picking random sample from entire set.")
131
- sample_idx = random.randint(0, len(dataset) - 1)
132
-
133
- print(f"\nEvaluating on sample index: {sample_idx}")
134
 
135
- # Initialize encoders and model
136
  print("Initializing encoders...")
137
  multi_modal_encoder = MultiModalEncoder(dtype=init_dtype, device=device)
138
  time_encoder = ContextualTimeEncoder(dtype=init_dtype)
@@ -181,9 +160,6 @@ def main():
181
  model = accelerator.prepare(model)
182
  else:
183
  print(f"Loading checkpoint from {ckpt_path}...")
184
- # Since we use accelerate, the state dict is usually split or in pytorch_model.bin/model.safetensors
185
- # Using accelerate to load:
186
- # We need to wrap it if we want to use `accelerator.load_state`
187
  model = accelerator.prepare(model)
188
  try:
189
  accelerator.load_state(ckpt_path)
@@ -202,7 +178,6 @@ def main():
202
  else:
203
  state_dict = torch.load(model_file, map_location="cpu")
204
 
205
- # Unwrap model to load state
206
  uw_model = accelerator.unwrap_model(model)
207
  uw_model.load_state_dict(state_dict, strict=False)
208
  print("Successfully loaded weights directly.")
@@ -211,11 +186,51 @@ def main():
211
 
212
  model.eval()
213
 
214
- # Get sample
215
- raw_sample = dataset[sample_idx]
216
- if raw_sample is None:
217
- print("Sample is None!")
218
- return
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
219
 
220
  batch = collator([raw_sample])
221
 
 
23
  from models.model import Oracle
24
  import models.vocabulary as vocab
25
  from train import create_balanced_split
26
+ from dotenv import load_dotenv
27
+ from clickhouse_driver import Client as ClickHouseClient
28
+ from neo4j import GraphDatabase
29
+ from data.data_fetcher import DataFetcher
30
+ from scripts.analyze_distribution import get_return_class_map
31
 
32
  def unlog_transform(tensor):
33
  """Invert the log1p transform applied during training."""
 
37
  def parse_args():
38
  parser = argparse.ArgumentParser()
39
  parser.add_argument("--checkpoint", type=str, default="checkpoints/checkpoint-90000", help="Path to checkpoint dir")
40
+ parser.add_argument("--sample_idx", type=str, default=None, help="Specific sample index or Mint Address to evaluate")
 
41
  parser.add_argument("--mixed_precision", type=str, default="bf16")
42
  parser.add_argument("--horizons_seconds", type=int, nargs="+", default=[300, 900, 1800, 3600, 7200])
43
  parser.add_argument("--quantiles", type=float, nargs="+", default=[0.1, 0.5, 0.9])
44
+ parser.add_argument("--seed", type=int, default=None)
45
+ parser.add_argument("--min_class", type=int, default=5, help="Filter out tokens with return class beneath this ID (e.g., 1 for >= 3x returns)")
46
+ parser.add_argument("--cutoff_trade_idx", type=int, default=600, help="Force the T_cutoff at this exact trade index (e.g., 10 = right after the 10th trade)")
47
  return parser.parse_args()
48
 
49
  def get_latest_checkpoint(checkpoint_dir):
 
57
  return None
58
 
59
  def main():
60
+ load_dotenv()
61
  args = parse_args()
62
 
63
  accelerator = Accelerator(mixed_precision=args.mixed_precision)
 
69
  elif accelerator.mixed_precision == 'fp16':
70
  init_dtype = torch.float16
71
 
72
+ print("INFO: Initializing DB Connections for LIVE evaluation...")
73
+ clickhouse_host = os.getenv("CLICKHOUSE_HOST", "localhost")
74
+ clickhouse_port = int(os.getenv("CLICKHOUSE_PORT", 9000))
75
+ neo4j_uri = os.getenv("NEO4J_URI", "bolt://localhost:7687")
76
+ neo4j_user = os.getenv("NEO4J_USER", "neo4j")
77
+ neo4j_password = os.getenv("NEO4J_PASSWORD", "password")
78
+
79
+ clickhouse_client = ClickHouseClient(host=clickhouse_host, port=clickhouse_port)
80
+ neo4j_driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
81
+ data_fetcher = DataFetcher(clickhouse_client=clickhouse_client, neo4j_driver=neo4j_driver)
82
+
83
+ print(f"Loading live dataset generator...")
84
+
85
+ # We inject the data fetcher directly. No cache directories are used.
86
  dataset = OracleDataset(
87
+ data_fetcher=data_fetcher,
88
  fetcher_config=None,
89
  horizons_seconds=args.horizons_seconds,
90
  quantiles=args.quantiles,
91
+ cache_dir=None
 
 
92
  )
93
 
94
+ # Filter out manipulated/broken tokens and optionally enforce min_class
95
+ from models.vocabulary import MANIPULATED_CLASS_ID
96
+ print("INFO: Fetching Return Classification Map...")
97
+ return_class_map, _ = get_return_class_map(clickhouse_client)
98
+
99
+ min_class_thresh = args.min_class if args.min_class is not None else 0
100
+
101
+ original_len = len(dataset.sampled_mints)
102
+ dataset.sampled_mints = [
103
+ m for m in dataset.sampled_mints
104
+ if return_class_map.get(m['mint_address']) is not None
105
+ and return_class_map.get(m['mint_address']) != MANIPULATED_CLASS_ID
106
+ and return_class_map.get(m['mint_address']) >= min_class_thresh
107
+ ]
108
+ dataset.num_samples = len(dataset.sampled_mints)
109
+ print(f"INFO: Filtered tokens. {original_len} -> {len(dataset.sampled_mints)} valid tokens (class >= {min_class_thresh}).")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
+ if len(dataset) == 0:
112
+ raise ValueError("Dataset is empty. Are ClickHouse data and trade pipelines populated? (Check if min_return filtered everything out)")
 
 
 
 
 
 
 
 
 
 
 
113
 
114
+ # Initialize encoders and model FIRST because we need multi_modal_encoder to compile context
115
  print("Initializing encoders...")
116
  multi_modal_encoder = MultiModalEncoder(dtype=init_dtype, device=device)
117
  time_encoder = ContextualTimeEncoder(dtype=init_dtype)
 
160
  model = accelerator.prepare(model)
161
  else:
162
  print(f"Loading checkpoint from {ckpt_path}...")
 
 
 
163
  model = accelerator.prepare(model)
164
  try:
165
  accelerator.load_state(ckpt_path)
 
178
  else:
179
  state_dict = torch.load(model_file, map_location="cpu")
180
 
 
181
  uw_model = accelerator.unwrap_model(model)
182
  uw_model.load_state_dict(state_dict, strict=False)
183
  print("Successfully loaded weights directly.")
 
186
 
187
  model.eval()
188
 
189
+ # Find a valid sample
190
+ valid_context_found = False
191
+ max_retries = 20
192
+ retries = 0
193
+ raw_sample = None
194
+ sample_mint_addr = None
195
+
196
+ while not valid_context_found and retries < max_retries:
197
+ if args.sample_idx is not None:
198
+ if isinstance(args.sample_idx, str) and not args.sample_idx.isdigit():
199
+ found_idx = next((i for i, m in enumerate(dataset.sampled_mints) if m['mint_address'] == args.sample_idx), None)
200
+ if found_idx is None:
201
+ import datetime
202
+ dataset.sampled_mints.append({'mint_address': args.sample_idx, 'creator_address': '', 'timestamp': datetime.datetime.now(datetime.timezone.utc)})
203
+ sample_idx = len(dataset.sampled_mints) - 1
204
+ else:
205
+ sample_idx = found_idx
206
+ else:
207
+ sample_idx = int(args.sample_idx)
208
+ if sample_idx >= len(dataset):
209
+ raise ValueError(f"Sample index {sample_idx} out of range")
210
+ else:
211
+ sample_idx = random.randint(0, len(dataset.sampled_mints) - 1)
212
+
213
+ sample_mint_addr = dataset.sampled_mints[sample_idx]['mint_address']
214
+ print(f"Trying Token Address: {sample_mint_addr}")
215
+
216
+ contexts = dataset.__cacheitem_context__(sample_idx, num_samples_per_token=1, encoder=multi_modal_encoder, forced_cutoff_trade_idx=args.cutoff_trade_idx)
217
+
218
+ if not contexts or len(contexts) == 0 or contexts[0] is None:
219
+ print(" [Failed to generate valid context pattern, skipping...]")
220
+ retries += 1
221
+ if args.sample_idx is not None:
222
+ print("Specific sample requested but failed to generate context. Exiting.")
223
+ return
224
+ continue
225
+
226
+ raw_sample = contexts[0]
227
+ valid_context_found = True
228
+
229
+ if not valid_context_found:
230
+ print(f"Could not find a valid context after {max_retries} attempts.")
231
+ return
232
+
233
+ print(f"\nEvaluating precisely on Token Address: {sample_mint_addr}")
234
 
235
  batch = collator([raw_sample])
236
 
status CHANGED
@@ -1,3 +1,3 @@
1
- PID: 7085
2
- Started at: 2026-01-26 07:54:26
3
- Revision: 54508
 
1
+ PID: 5825
2
+ Started at: 2026-03-06 06:22:33
3
+ Revision: 54510
wget-log.1 ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
0
- 0%[ ] 0 --.-KB/s
1
- 100%[=============================================================================================================================>] 3.81K --.-KB/s in 0s
 
 
1
+ --2026-03-06 06:21:59-- https://debian.neo4j.com/neotechnology.gpg.key
2
+ Resolving debian.neo4j.com (debian.neo4j.com)... 3.169.221.47, 3.169.221.34, 3.169.221.16, ...
3
+ Connecting to debian.neo4j.com (debian.neo4j.com)|3.169.221.47|:443... connected.
4
+ HTTP request sent, awaiting response... 200 OK
5
+ Length: 3905 (3.8K) [application/pgp-keys]
6
+ Saving to: ‘STDOUT’
7
+
8
+
9
+
10
+ 2026-03-06 06:21:59 (9.00 MB/s) - written to stdout [3905/3905]
11
+