Spaces:
Paused
Paused
Commit ·
97ee7e7
1
Parent(s): e2c547b
update
Browse files- README.md +27 -0
- RESEARCH.md +37 -1
- inference.py +28 -5
- models.py +37 -0
- server/data/audience_overlap_matrix.json +11 -10
- server/viraltest_environment.py +210 -36
README.md
CHANGED
|
@@ -93,6 +93,33 @@ Tiered from [Buffer 2.1M study](https://buffer.com/resources/how-often-to-post-o
|
|
| 93 |
| `monthly_strategic` | Medium | + tag discovery/exploitation + energy + consistency |
|
| 94 |
| `monthly_competitive` | Hard | + growth vs competitors + differentiation + content diversity |
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
## Tool catalog
|
| 97 |
|
| 98 |
| Tool | Cost | Returns |
|
|
|
|
| 93 |
| `monthly_strategic` | Medium | + tag discovery/exploitation + energy + consistency |
|
| 94 |
| `monthly_competitive` | Hard | + growth vs competitors + differentiation + content diversity |
|
| 95 |
|
| 96 |
+
## Regulator/Judge Mode (per-day audit)
|
| 97 |
+
|
| 98 |
+
Every day the env emits a deterministic, explainable `JudgeReport` on the observation:
|
| 99 |
+
|
| 100 |
+
```python
|
| 101 |
+
JudgeReport(
|
| 102 |
+
policy_compliance=1.00, # 1.0 - sum(weighted_violations); see _compute_judge_report
|
| 103 |
+
sustainability_risk=0.10, # 0.4*(1-energy_min) + 0.3*sleep_debt + 0.3*low_energy_ratio
|
| 104 |
+
strategic_quality=0.96, # 0.4*engagement_per_post + 0.3*intent_diversity + 0.3*format_diversity
|
| 105 |
+
explanation="compliance=1.00 risk=0.10 strategy=0.96 | no policy violations",
|
| 106 |
+
violations=[], # human-readable rule breaks (Buffer 2.1M, Van Dongen, Cen 2024)
|
| 107 |
+
)
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
Auditable rules (all sourced): >5 posts/day → fatigue cliff (Buffer 2.1M); >7 posts/week → weekly cap; ≥4 collabs/month → diminishing returns (Cen 2024); >22h awake → sleep debt (Van Dongen 2003).
|
| 111 |
+
|
| 112 |
+
## Headline metrics (final-step audit)
|
| 113 |
+
|
| 114 |
+
The final observation carries `HeadlineMetrics` with the three numbers judges remember:
|
| 115 |
+
|
| 116 |
+
| Metric | What it measures | Source of truth |
|
| 117 |
+
|---|---|---|
|
| 118 |
+
| `vs_baseline_pct` | (agent_score − heuristic_baseline) / heuristic_baseline | Empirical baseline loaded from `plots/training_summary.json["smart_heuristic"]` (0.43 / 0.77 / 0.81) |
|
| 119 |
+
| `score_per_tool_call` | grader_score / total_tool_calls | Efficiency: did the agent learn to call tools sparingly? |
|
| 120 |
+
| `score_per_1k_chars` | grader_score per 1k action JSON chars | Token-proxy efficiency |
|
| 121 |
+
| `retention_under_shift` | shifted_score / baseline_score | Pass `episode_chain_id` + `shift_label="baseline"` then `="shifted"` to a second `reset` to populate. None until both runs complete. |
|
| 122 |
+
|
| 123 |
## Tool catalog
|
| 124 |
|
| 125 |
| Tool | Cost | Returns |
|
RESEARCH.md
CHANGED
|
@@ -135,7 +135,7 @@ Every constant and design decision in Viraltest is backed by a verifiable source
|
|
| 135 |
|
| 136 |
**Key findings:** 3–5 posts/week doubles follower growth vs 1–2. 7+/week shows 20–35% engagement drop per post. Diminishing returns above 5/week.
|
| 137 |
|
| 138 |
-
**What we use:** `FATIGUE_TIERS`, `WEEKLY_FATIGUE_THRESHOLD = 7`, `_theoretical_max_engagement`
|
| 139 |
|
| 140 |
---
|
| 141 |
|
|
@@ -196,6 +196,42 @@ Every constant and design decision in Viraltest is backed by a verifiable source
|
|
| 196 |
|
| 197 |
---
|
| 198 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
### Goldman Sachs Global Investment Research (March 2025)
|
| 200 |
|
| 201 |
**Title:** Creator Economy: Framing the Market Opportunity
|
|
|
|
| 135 |
|
| 136 |
**Key findings:** 3–5 posts/week doubles follower growth vs 1–2. 7+/week shows 20–35% engagement drop per post. Diminishing returns above 5/week.
|
| 137 |
|
| 138 |
+
**What we use:** `FATIGUE_TIERS`, `WEEKLY_FATIGUE_THRESHOLD = 7`, `_theoretical_max_engagement` caps at 5 posts/week × `TASK_HORIZON/7` weeks (≈21 posts for 30-day horizon — the Buffer-defined sweet spot before fatigue penalties kick in).
|
| 139 |
|
| 140 |
---
|
| 141 |
|
|
|
|
| 196 |
|
| 197 |
---
|
| 198 |
|
| 199 |
+
### Later (2023) — Instagram Collaboration Posts Performance Study
|
| 200 |
+
|
| 201 |
+
**URL:** [later.com/blog/instagram-collab-posts](https://later.com/blog/instagram-collab-posts)
|
| 202 |
+
**Sample:** ~5K co-authored posts across the Later customer base (disclosed)
|
| 203 |
+
**Methodology:** Comparison of Collab posts (single post shared to two feeds) vs equivalent solo posts from the same accounts.
|
| 204 |
+
|
| 205 |
+
**Key findings:** Collab posts averaged ~88% more reach and ~40% more impressions than solo posts. Lift driven primarily by exposure to the partner's audience.
|
| 206 |
+
|
| 207 |
+
**What we use:** `COLLAB_REACH_K = 0.60` — reach uplift scales with `(1 - overlap)` and is capped below the headline 88% because reach in our model is already amplified by `REACH_MULT` and `hour_mult`; net post-cap uplift on the constrained engagement value lands in the +30–50% band Later reports for matched-niche pairs.
|
| 208 |
+
|
| 209 |
+
---
|
| 210 |
+
|
| 211 |
+
### HypeAuditor (2024) — Influencer Collaboration Benchmark
|
| 212 |
+
|
| 213 |
+
**URL:** [hypeauditor.com/blog/influencer-collaboration](https://hypeauditor.com/blog/influencer-collaboration)
|
| 214 |
+
**Sample:** 10K+ Instagram collaboration posts across niches
|
| 215 |
+
**Methodology:** Per-impression engagement rate, segmented by niche affinity (same niche, adjacent, cross-niche).
|
| 216 |
+
|
| 217 |
+
**Key findings:** Same-niche collabs achieve ~30% higher engagement-per-impression than cross-niche; cross-niche collabs gain new followers but per-impression rate is roughly flat or slightly negative.
|
| 218 |
+
|
| 219 |
+
**What we use:** `COLLAB_AFFINITY_K = 0.30` — engagement-per-impression boost scales with `overlap`, peaking when the partner's audience already shares the user's niche.
|
| 220 |
+
|
| 221 |
+
---
|
| 222 |
+
|
| 223 |
+
### Rival IQ (2025) — Cross-Industry Audience Overlap Patterns
|
| 224 |
+
|
| 225 |
+
**URL:** [rivaliq.com/blog/social-media-industry-benchmark-report](https://www.rivaliq.com/blog/social-media-industry-benchmark-report/) (cross-industry chapter)
|
| 226 |
+
|
| 227 |
+
**Key findings:** Same-industry account pairs share 40–65% of their audience; adjacent industries 20–35%; unrelated industries 5–15%. Cross-industry collabs drive new follower acquisition at roughly 2–2.5× the rate of same-industry collabs.
|
| 228 |
+
|
| 229 |
+
**What we use:** `audience_overlap_matrix.json` values and `COLLAB_GROWTH_K = 1.50` — follower spillover scales with `(1 - overlap)`, peaking at +150% when overlap is zero (matches the upper end of Rival IQ's cross-industry follower-acquisition lift).
|
| 230 |
+
|
| 231 |
+
Per-episode collab cadence is **not hard-capped**. Instead, each successive collab in a month is multiplied by `1 / (1 + COLLAB_FATIGUE_K · prior_collabs)` (`K = 0.3`): the multiplier falls to ~77% on the 2nd, 63% on the 3rd, 53% on the 4th. With base `engagement ≈ 1.52×` from a typical-overlap partner, this puts the 1st–2nd collab clearly above the no-collab baseline, the 3rd roughly neutral, and the 4th+ net-negative. This follows Cen et al. 2024's argument that disengagement-aware policies should price marginal exposure rather than impose binary caps, and lets the policy discover its own collab frequency from reward gradient.
|
| 232 |
+
|
| 233 |
+
---
|
| 234 |
+
|
| 235 |
### Goldman Sachs Global Investment Research (March 2025)
|
| 236 |
|
| 237 |
**Title:** Creator Economy: Framing the Market Opportunity
|
inference.py
CHANGED
|
@@ -35,7 +35,7 @@ _REQUESTED_MAX = int(os.getenv("MAX_STEPS", str(TASK_HORIZON)))
|
|
| 35 |
MAX_STEPS = _REQUESTED_MAX if _ALLOW_SHORT else max(_REQUESTED_MAX, TASK_HORIZON)
|
| 36 |
TEMPERATURE = 0.7
|
| 37 |
MAX_TOKENS = 768
|
| 38 |
-
SUCCESS_SCORE_THRESHOLD = 0.
|
| 39 |
|
| 40 |
ALL_TOPICS: List[str] = [
|
| 41 |
topic for topics in TOPIC_CATEGORIES.values() for topic in topics
|
|
@@ -111,11 +111,24 @@ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[
|
|
| 111 |
)
|
| 112 |
|
| 113 |
|
| 114 |
-
def log_end(
|
|
|
|
|
|
|
|
|
|
| 115 |
rewards_str = ",".join(f"{r:.2f}" for r in rewards)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
print(
|
| 117 |
f"[END] success={str(success).lower()} steps={steps} "
|
| 118 |
-
f"score={score:.2f} rewards={rewards_str}",
|
| 119 |
flush=True,
|
| 120 |
)
|
| 121 |
|
|
@@ -140,6 +153,14 @@ def format_observation(obs: Any) -> str:
|
|
| 140 |
if coach:
|
| 141 |
coach_str = f"Coach: delta={coach.get('delta', 0):.3f}, suggestion={coach.get('suggestion', '')}\n"
|
| 142 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
signals = getattr(obs, "engagement_signals", None)
|
| 144 |
signals_str = ""
|
| 145 |
if signals:
|
|
@@ -153,7 +174,7 @@ Day: {day_name} (day_of_week={obs.day_of_week}) | days_elapsed={obs.days_elapsed
|
|
| 153 |
Energy: {obs.creator_energy:.2f} | Burnout risk: {burnout:.2f} | Followers: {obs.follower_count}
|
| 154 |
Engagement rate: {obs.engagement_rate:.3f} | Content queue: {obs.content_queue_size}
|
| 155 |
API budget remaining: {budget}
|
| 156 |
-
{signals_str}{coach_str}Tool results from last step:
|
| 157 |
{tool_results_str if tool_results_str else ' (none)\n'}Your notes from last step: {notes_echo}
|
| 158 |
Plan your tool calls and actions for today:""")
|
| 159 |
|
|
@@ -282,6 +303,7 @@ async def run_task(client: OpenAI, task: str) -> None:
|
|
| 282 |
score = 0.0
|
| 283 |
success = False
|
| 284 |
env: Optional[ViraltestEnv] = None
|
|
|
|
| 285 |
|
| 286 |
log_start(task=task, env=BENCHMARK, model=MODEL_NAME)
|
| 287 |
|
|
@@ -336,6 +358,7 @@ async def run_task(client: OpenAI, task: str) -> None:
|
|
| 336 |
if score == 0:
|
| 337 |
meta = getattr(result.observation, "metadata", {}) or {}
|
| 338 |
score = float(meta.get("grader_score", 0.0))
|
|
|
|
| 339 |
break
|
| 340 |
|
| 341 |
success = score >= SUCCESS_SCORE_THRESHOLD
|
|
@@ -346,7 +369,7 @@ async def run_task(client: OpenAI, task: str) -> None:
|
|
| 346 |
await env.close()
|
| 347 |
except Exception as e:
|
| 348 |
print(f"[DEBUG] env.close() error: {e}", flush=True)
|
| 349 |
-
log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
|
| 350 |
|
| 351 |
|
| 352 |
async def main() -> None:
|
|
|
|
| 35 |
MAX_STEPS = _REQUESTED_MAX if _ALLOW_SHORT else max(_REQUESTED_MAX, TASK_HORIZON)
|
| 36 |
TEMPERATURE = 0.7
|
| 37 |
MAX_TOKENS = 768
|
| 38 |
+
SUCCESS_SCORE_THRESHOLD = 0.50
|
| 39 |
|
| 40 |
ALL_TOPICS: List[str] = [
|
| 41 |
topic for topics in TOPIC_CATEGORIES.values() for topic in topics
|
|
|
|
| 111 |
)
|
| 112 |
|
| 113 |
|
| 114 |
+
def log_end(
|
| 115 |
+
success: bool, steps: int, score: float, rewards: List[float],
|
| 116 |
+
headline: Optional[Any] = None,
|
| 117 |
+
) -> None:
|
| 118 |
rewards_str = ",".join(f"{r:.2f}" for r in rewards)
|
| 119 |
+
head_str = ""
|
| 120 |
+
if headline is not None:
|
| 121 |
+
retention = headline.retention_under_shift
|
| 122 |
+
retention_str = f"{retention:.2f}" if retention is not None else "n/a"
|
| 123 |
+
head_str = (
|
| 124 |
+
f" vs_baseline_pct={headline.vs_baseline_pct:+.2%} "
|
| 125 |
+
f"score_per_tool={headline.score_per_tool_call:.3f} "
|
| 126 |
+
f"score_per_1k_chars={headline.score_per_1k_chars:.3f} "
|
| 127 |
+
f"retention_under_shift={retention_str}"
|
| 128 |
+
)
|
| 129 |
print(
|
| 130 |
f"[END] success={str(success).lower()} steps={steps} "
|
| 131 |
+
f"score={score:.2f} rewards={rewards_str}{head_str}",
|
| 132 |
flush=True,
|
| 133 |
)
|
| 134 |
|
|
|
|
| 153 |
if coach:
|
| 154 |
coach_str = f"Coach: delta={coach.get('delta', 0):.3f}, suggestion={coach.get('suggestion', '')}\n"
|
| 155 |
|
| 156 |
+
judge = getattr(obs, "judge_report", None)
|
| 157 |
+
judge_str = ""
|
| 158 |
+
if judge:
|
| 159 |
+
judge_str = (
|
| 160 |
+
f"Judge: compliance={judge.policy_compliance:.2f} risk={judge.sustainability_risk:.2f} "
|
| 161 |
+
f"strategy={judge.strategic_quality:.2f} | {judge.explanation}\n"
|
| 162 |
+
)
|
| 163 |
+
|
| 164 |
signals = getattr(obs, "engagement_signals", None)
|
| 165 |
signals_str = ""
|
| 166 |
if signals:
|
|
|
|
| 174 |
Energy: {obs.creator_energy:.2f} | Burnout risk: {burnout:.2f} | Followers: {obs.follower_count}
|
| 175 |
Engagement rate: {obs.engagement_rate:.3f} | Content queue: {obs.content_queue_size}
|
| 176 |
API budget remaining: {budget}
|
| 177 |
+
{signals_str}{coach_str}{judge_str}Tool results from last step:
|
| 178 |
{tool_results_str if tool_results_str else ' (none)\n'}Your notes from last step: {notes_echo}
|
| 179 |
Plan your tool calls and actions for today:""")
|
| 180 |
|
|
|
|
| 303 |
score = 0.0
|
| 304 |
success = False
|
| 305 |
env: Optional[ViraltestEnv] = None
|
| 306 |
+
headline: Optional[Any] = None
|
| 307 |
|
| 308 |
log_start(task=task, env=BENCHMARK, model=MODEL_NAME)
|
| 309 |
|
|
|
|
| 358 |
if score == 0:
|
| 359 |
meta = getattr(result.observation, "metadata", {}) or {}
|
| 360 |
score = float(meta.get("grader_score", 0.0))
|
| 361 |
+
headline = getattr(result.observation, "headline_metrics", None)
|
| 362 |
break
|
| 363 |
|
| 364 |
success = score >= SUCCESS_SCORE_THRESHOLD
|
|
|
|
| 369 |
await env.close()
|
| 370 |
except Exception as e:
|
| 371 |
print(f"[DEBUG] env.close() error: {e}", flush=True)
|
| 372 |
+
log_end(success=success, steps=steps_taken, score=score, rewards=rewards, headline=headline)
|
| 373 |
|
| 374 |
|
| 375 |
async def main() -> None:
|
models.py
CHANGED
|
@@ -108,6 +108,35 @@ class ViraltestAction(Action):
|
|
| 108 |
return deduped
|
| 109 |
|
| 110 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
class EngagementSignals(BaseModel):
|
| 112 |
"""Mosseri-aligned engagement decomposition (Jan 2025 official ranking signals)."""
|
| 113 |
|
|
@@ -161,6 +190,14 @@ class ViraltestObservation(Observation):
|
|
| 161 |
default=None,
|
| 162 |
description="Counterfactual feedback: delta between agent plan and heatmap-optimal plan",
|
| 163 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
tool_results: List[ToolResult] = Field(default_factory=list, description="Results from tool_calls this step")
|
| 166 |
agent_notes: Optional[str] = Field(default=None, description="Echo of agent's notes from previous step")
|
|
|
|
| 108 |
return deduped
|
| 109 |
|
| 110 |
|
| 111 |
+
class JudgeReport(BaseModel):
|
| 112 |
+
"""Auditable per-day evaluation by the in-env Regulator/Judge.
|
| 113 |
+
|
| 114 |
+
Scores are 0..1. `sustainability_risk` is RISK (higher = worse).
|
| 115 |
+
"""
|
| 116 |
+
|
| 117 |
+
policy_compliance: float = Field(default=1.0, ge=0.0, le=1.0)
|
| 118 |
+
sustainability_risk: float = Field(default=0.0, ge=0.0, le=1.0)
|
| 119 |
+
strategic_quality: float = Field(default=0.0, ge=0.0, le=1.0)
|
| 120 |
+
explanation: str = Field(default="")
|
| 121 |
+
violations: List[str] = Field(default_factory=list)
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
class HeadlineMetrics(BaseModel):
|
| 125 |
+
"""Three headline numbers reported once per episode (final observation)."""
|
| 126 |
+
|
| 127 |
+
vs_baseline_pct: float = Field(default=0.0, description="(agent - heuristic_baseline) / heuristic_baseline")
|
| 128 |
+
score_per_tool_call: float = Field(default=0.0, description="grader_score / total_tool_calls (efficiency)")
|
| 129 |
+
score_per_1k_chars: float = Field(default=0.0, description="grader_score per 1k action chars (token-proxy efficiency)")
|
| 130 |
+
retention_under_shift: Optional[float] = Field(
|
| 131 |
+
default=None,
|
| 132 |
+
description="shifted_score / baseline_score, populated when both runs share an episode_chain_id",
|
| 133 |
+
)
|
| 134 |
+
heuristic_baseline_score: float = Field(default=0.0)
|
| 135 |
+
agent_score: float = Field(default=0.0)
|
| 136 |
+
total_tool_calls: int = Field(default=0, ge=0)
|
| 137 |
+
total_action_chars: int = Field(default=0, ge=0)
|
| 138 |
+
|
| 139 |
+
|
| 140 |
class EngagementSignals(BaseModel):
|
| 141 |
"""Mosseri-aligned engagement decomposition (Jan 2025 official ranking signals)."""
|
| 142 |
|
|
|
|
| 190 |
default=None,
|
| 191 |
description="Counterfactual feedback: delta between agent plan and heatmap-optimal plan",
|
| 192 |
)
|
| 193 |
+
judge_report: Optional[JudgeReport] = Field(
|
| 194 |
+
default=None,
|
| 195 |
+
description="Regulator/Judge audit: policy compliance, sustainability risk, strategic quality + explanation",
|
| 196 |
+
)
|
| 197 |
+
headline_metrics: Optional[HeadlineMetrics] = Field(
|
| 198 |
+
default=None,
|
| 199 |
+
description="Final-observation hard numbers: improvement vs baseline, efficiency, shift retention",
|
| 200 |
+
)
|
| 201 |
|
| 202 |
tool_results: List[ToolResult] = Field(default_factory=list, description="Results from tool_calls this step")
|
| 203 |
agent_notes: Optional[str] = Field(default=None, description="Echo of agent's notes from previous step")
|
server/data/audience_overlap_matrix.json
CHANGED
|
@@ -1,16 +1,17 @@
|
|
| 1 |
{
|
| 2 |
"_meta": {
|
| 3 |
-
"description": "
|
| 4 |
-
"source": "
|
| 5 |
},
|
| 6 |
-
"archetype_ids": ["niche_expert", "viral_chaser", "lifestyle_blogger", "b2b_thought_leader", "food_creator", "fitness_coach", "travel_creator"],
|
| 7 |
"matrix": [
|
| 8 |
-
[1.00, 0.12, 0.10, 0.40, 0.08, 0.10, 0.15],
|
| 9 |
-
[0.12, 1.00, 0.55, 0.10, 0.20, 0.25, 0.30],
|
| 10 |
-
[0.10, 0.55, 1.00, 0.15, 0.30, 0.35, 0.40],
|
| 11 |
-
[0.40, 0.10, 0.15, 1.00, 0.08, 0.10, 0.12],
|
| 12 |
-
[0.08, 0.20, 0.30, 0.08, 1.00, 0.45, 0.35],
|
| 13 |
-
[0.10, 0.25, 0.35, 0.10, 0.45, 1.00, 0.30],
|
| 14 |
-
[0.15, 0.30, 0.40, 0.12, 0.35, 0.30, 1.00]
|
|
|
|
| 15 |
]
|
| 16 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"_meta": {
|
| 3 |
+
"description": "8x8 symmetric audience overlap matrix between competitor archetypes and the user creator. Values 0.0-1.0 represent fraction of shared audience. Used by propose_collab to compute collab reward multipliers and by query_creator_pool to expose overlap to the agent. Same-niche pairs ~0.4-0.65, cross-niche ~0.05-0.20.",
|
| 4 |
+
"source": "Competitor pairs estimated from Rival IQ 2025 cross-industry overlap patterns + niche proximity heuristic. user_creator row tuned to a generic micro-creator (no locked niche): broad mass-market partners (lifestyle_blogger, viral_chaser) score highest; specialist partners (b2b_thought_leader, niche_expert) score lowest."
|
| 5 |
},
|
| 6 |
+
"archetype_ids": ["niche_expert", "viral_chaser", "lifestyle_blogger", "b2b_thought_leader", "food_creator", "fitness_coach", "travel_creator", "user_creator"],
|
| 7 |
"matrix": [
|
| 8 |
+
[1.00, 0.12, 0.10, 0.40, 0.08, 0.10, 0.15, 0.10],
|
| 9 |
+
[0.12, 1.00, 0.55, 0.10, 0.20, 0.25, 0.30, 0.35],
|
| 10 |
+
[0.10, 0.55, 1.00, 0.15, 0.30, 0.35, 0.40, 0.40],
|
| 11 |
+
[0.40, 0.10, 0.15, 1.00, 0.08, 0.10, 0.12, 0.08],
|
| 12 |
+
[0.08, 0.20, 0.30, 0.08, 1.00, 0.45, 0.35, 0.25],
|
| 13 |
+
[0.10, 0.25, 0.35, 0.10, 0.45, 1.00, 0.30, 0.28],
|
| 14 |
+
[0.15, 0.30, 0.40, 0.12, 0.35, 0.30, 1.00, 0.30],
|
| 15 |
+
[0.10, 0.35, 0.40, 0.08, 0.25, 0.28, 0.30, 1.00]
|
| 16 |
]
|
| 17 |
}
|
server/viraltest_environment.py
CHANGED
|
@@ -27,6 +27,8 @@ try:
|
|
| 27 |
from ..models import (
|
| 28 |
CollabProposal,
|
| 29 |
EngagementSignals,
|
|
|
|
|
|
|
| 30 |
ReplyAction,
|
| 31 |
ScheduledAction,
|
| 32 |
ToolCall,
|
|
@@ -38,6 +40,8 @@ except ImportError:
|
|
| 38 |
from models import (
|
| 39 |
CollabProposal,
|
| 40 |
EngagementSignals,
|
|
|
|
|
|
|
| 41 |
ReplyAction,
|
| 42 |
ScheduledAction,
|
| 43 |
ToolCall,
|
|
@@ -156,11 +160,41 @@ WEEKLY_FATIGUE_MULT = 0.75
|
|
| 156 |
|
| 157 |
SATURATION_PENALTY_K = 0.25
|
| 158 |
TREND_DEFAULT_HALFLIFE_HOURS = 60
|
| 159 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 160 |
REPLY_WINDOW_MINUTES = 90
|
| 161 |
REPLY_REACH_BONUS = 1.4
|
| 162 |
API_BUDGET_INITIAL = 100
|
| 163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
# Tool costs
|
| 165 |
TOOL_COSTS = {
|
| 166 |
"query_audience": 2,
|
|
@@ -231,7 +265,7 @@ TOOL_CATALOG = {
|
|
| 231 |
"parameters": {},
|
| 232 |
},
|
| 233 |
"propose_collab": {
|
| 234 |
-
"description": "Propose a
|
| 235 |
"parameters": {
|
| 236 |
"partner_id": {"type": "string"},
|
| 237 |
"content_type": {"type": "string", "enum": ["reel", "story", "carousel", "text_post"]},
|
|
@@ -280,10 +314,15 @@ class ViraltestEnvironment(Environment):
|
|
| 280 |
self._api_budget = API_BUDGET_INITIAL
|
| 281 |
self._collabs_this_month = 0
|
| 282 |
self._collab_history: List[str] = []
|
|
|
|
| 283 |
self._low_energy_days = 0
|
| 284 |
self._total_posts_this_week = 0
|
| 285 |
self._week_start_day = 0
|
| 286 |
self._daily_signals = EngagementSignals()
|
|
|
|
|
|
|
|
|
|
|
|
|
| 287 |
|
| 288 |
self._trending_topics = self._pick_trending_topics()
|
| 289 |
self._trending_tags = self._pick_trending_tags()
|
|
@@ -468,6 +507,32 @@ class ViraltestEnvironment(Environment):
|
|
| 468 |
|
| 469 |
return daily_fatigue * weekly_mult
|
| 470 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 471 |
# ----- engagement signals (Mosseri-aligned) -----
|
| 472 |
|
| 473 |
def _compute_engagement_signals(
|
|
@@ -556,19 +621,17 @@ class ViraltestEnvironment(Environment):
|
|
| 556 |
elif tool.name == "query_creator_pool":
|
| 557 |
pool = []
|
| 558 |
for comp in self._competitors:
|
| 559 |
-
|
| 560 |
-
|
| 561 |
-
|
| 562 |
-
|
| 563 |
-
|
| 564 |
return ToolResult(name=tool.name, data=pool, budget_remaining=self._api_budget)
|
| 565 |
|
| 566 |
elif tool.name == "propose_collab":
|
| 567 |
-
if self._collabs_this_month >= COLLAB_MAX_PER_MONTH:
|
| 568 |
-
return ToolResult(name=tool.name, success=False, error="collab_limit_reached", budget_remaining=self._api_budget)
|
| 569 |
partner_id = tool.arguments.get("partner_id", "")
|
| 570 |
-
if partner_id in self.
|
| 571 |
-
return ToolResult(name=tool.name, success=False, error="
|
| 572 |
return ToolResult(name=tool.name, data={"status": "proposal_accepted", "partner_id": partner_id}, budget_remaining=self._api_budget)
|
| 573 |
|
| 574 |
return ToolResult(name=tool.name, success=False, error=f"unknown tool: {tool.name}", budget_remaining=self._api_budget)
|
|
@@ -576,6 +639,9 @@ class ViraltestEnvironment(Environment):
|
|
| 576 |
# ----- counterfactual coach -----
|
| 577 |
|
| 578 |
def _compute_coach_feedback(self, agent_engagement: float) -> Dict[str, Any]:
|
|
|
|
|
|
|
|
|
|
| 579 |
dow = self._day % 7
|
| 580 |
row = _HEATMAP_GRID.get(dow, [1.0] * 24)
|
| 581 |
best_hours = sorted(range(24), key=lambda h: row[h] if h < len(row) else 0, reverse=True)[:2]
|
|
@@ -584,13 +650,98 @@ class ViraltestEnvironment(Environment):
|
|
| 584 |
optimal_eng = sum(row[h] * best_base * best_reach for h in best_hours)
|
| 585 |
delta = agent_engagement - optimal_eng
|
| 586 |
return {
|
| 587 |
-
"optimal_hours": best_hours,
|
| 588 |
-
"optimal_engagement_estimate": round(optimal_eng, 4),
|
| 589 |
-
"your_engagement": round(agent_engagement, 4),
|
| 590 |
"delta": round(delta, 4),
|
| 591 |
-
"suggestion":
|
|
|
|
|
|
|
|
|
|
|
|
|
| 592 |
}
|
| 593 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 594 |
# ----- core API -----
|
| 595 |
|
| 596 |
def reset(self, seed: Optional[int] = None, episode_id: Optional[str] = None, **kwargs: Any) -> ViraltestObservation:
|
|
@@ -602,6 +753,9 @@ class ViraltestEnvironment(Environment):
|
|
| 602 |
self._state = State(episode_id=episode_id or str(uuid4()), step_count=0)
|
| 603 |
self._init_state()
|
| 604 |
|
|
|
|
|
|
|
|
|
|
| 605 |
chain_id = kwargs.get("episode_chain_id")
|
| 606 |
if chain_id and chain_id in _BRAND_STORE:
|
| 607 |
brand = _BRAND_STORE[chain_id]
|
|
@@ -623,16 +777,24 @@ class ViraltestEnvironment(Environment):
|
|
| 623 |
if action.notes:
|
| 624 |
self._agent_notes = action.notes
|
| 625 |
|
| 626 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 627 |
tool_results: List[ToolResult] = []
|
| 628 |
for tc in action.tool_calls:
|
| 629 |
result = self._dispatch_tool(tc)
|
| 630 |
tool_results.append(result)
|
|
|
|
|
|
|
| 631 |
|
| 632 |
-
# Process collab proposal
|
| 633 |
-
|
|
|
|
| 634 |
self._collabs_this_month += 1
|
| 635 |
self._collab_history.append(action.collab.partner_id)
|
|
|
|
| 636 |
|
| 637 |
# Validate scheduled actions
|
| 638 |
schedule: Dict[int, ScheduledAction] = {}
|
|
@@ -718,10 +880,12 @@ class ViraltestEnvironment(Environment):
|
|
| 718 |
|
| 719 |
done = self._state.step_count >= TASK_HORIZON or self._energy <= 0.0
|
| 720 |
coach = self._compute_coach_feedback(daily_engagement)
|
|
|
|
| 721 |
|
| 722 |
if done:
|
| 723 |
self._episode_done = True
|
| 724 |
grader_score = self._run_grader()
|
|
|
|
| 725 |
|
| 726 |
chain_id = kwargs.get("episode_chain_id")
|
| 727 |
if chain_id:
|
|
@@ -738,7 +902,7 @@ class ViraltestEnvironment(Environment):
|
|
| 738 |
grader_score=grader_score, daily_total_engagement=daily_engagement,
|
| 739 |
daily_posts_made=daily_posts, daily_energy_min=energy_min,
|
| 740 |
tool_results=tool_results, engagement_signals=daily_signals,
|
| 741 |
-
coach_feedback=coach,
|
| 742 |
)
|
| 743 |
return self._final_observation
|
| 744 |
|
|
@@ -747,13 +911,15 @@ class ViraltestEnvironment(Environment):
|
|
| 747 |
daily_total_engagement=daily_engagement,
|
| 748 |
daily_posts_made=daily_posts, daily_energy_min=energy_min,
|
| 749 |
tool_results=tool_results, engagement_signals=daily_signals,
|
| 750 |
-
coach_feedback=coach,
|
| 751 |
)
|
| 752 |
|
| 753 |
def _process_hour_action(self, sa: ScheduledAction) -> Tuple[float, float, Optional[EngagementSignals]]:
|
| 754 |
engagement = 0.0
|
| 755 |
signals = None
|
| 756 |
|
|
|
|
|
|
|
| 757 |
if sa.action_type == "post":
|
| 758 |
cost = CONTENT_ENERGY_COST.get(sa.content_type, 0.1)
|
| 759 |
if self._content_queue > 0:
|
|
@@ -790,6 +956,12 @@ class ViraltestEnvironment(Environment):
|
|
| 790 |
* trending_bonus * comp_diff * fatigue * algo_mult
|
| 791 |
* niche_mult * saturation_factor
|
| 792 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 793 |
engagement = min(engagement, 5.0)
|
| 794 |
|
| 795 |
signals = self._compute_engagement_signals(sa.content_type, engagement, sa.intent)
|
|
@@ -819,7 +991,7 @@ class ViraltestEnvironment(Environment):
|
|
| 819 |
self._time_since_last_post = 0
|
| 820 |
|
| 821 |
if engagement > 0:
|
| 822 |
-
self._followers += int(engagement * 100)
|
| 823 |
|
| 824 |
elif sa.action_type == "create_content":
|
| 825 |
self._energy = max(0.0, self._energy - CREATE_CONTENT_COST)
|
|
@@ -955,6 +1127,8 @@ class ViraltestEnvironment(Environment):
|
|
| 955 |
tool_results: Optional[List[ToolResult]] = None,
|
| 956 |
engagement_signals: Optional[EngagementSignals] = None,
|
| 957 |
coach_feedback: Optional[Dict[str, Any]] = None,
|
|
|
|
|
|
|
| 958 |
) -> ViraltestObservation:
|
| 959 |
recent_eng = self._engagement_history[-10:] if self._engagement_history else []
|
| 960 |
eng_rate = sum(recent_eng) / len(recent_eng) if recent_eng else 0.0
|
|
@@ -984,6 +1158,8 @@ class ViraltestEnvironment(Environment):
|
|
| 984 |
daily_energy_min=round(daily_energy_min, 3),
|
| 985 |
engagement_signals=engagement_signals,
|
| 986 |
coach_feedback=coach_feedback,
|
|
|
|
|
|
|
| 987 |
tool_results=tool_results or [],
|
| 988 |
agent_notes=self._agent_notes,
|
| 989 |
api_budget_remaining=self._api_budget,
|
|
@@ -1006,35 +1182,33 @@ class ViraltestEnvironment(Environment):
|
|
| 1006 |
return 0.0
|
| 1007 |
|
| 1008 |
def _theoretical_max_engagement(self) -> float:
|
|
|
|
|
|
|
|
|
|
| 1009 |
best_base = max(BASE_ENGAGEMENT.values())
|
| 1010 |
best_reach = max(REACH_MULT.values())
|
| 1011 |
best_niche = max(_NICHE_MULTIPLIERS.values()) if _NICHE_MULTIPLIERS else 1.0
|
| 1012 |
|
| 1013 |
-
|
| 1014 |
-
|
| 1015 |
-
|
| 1016 |
|
| 1017 |
avg_heatmap_peak = 1.0
|
| 1018 |
if _HEATMAP_GRID:
|
| 1019 |
-
day_peaks = [
|
| 1020 |
-
|
| 1021 |
-
|
| 1022 |
-
|
| 1023 |
avg_heatmap_peak = sum(day_peaks) / len(day_peaks) if day_peaks else 1.0
|
| 1024 |
|
|
|
|
|
|
|
| 1025 |
trending_bonus = 1.25
|
| 1026 |
tag_boost = 1.1
|
| 1027 |
|
| 1028 |
-
total_posts = active_days * posts_per_active_day
|
| 1029 |
-
|
| 1030 |
-
weekly_fatigue = 1.0
|
| 1031 |
-
posts_per_week = total_posts / (TASK_HORIZON / 7.0)
|
| 1032 |
-
if posts_per_week >= WEEKLY_FATIGUE_THRESHOLD:
|
| 1033 |
-
weekly_fatigue = WEEKLY_FATIGUE_MULT
|
| 1034 |
-
|
| 1035 |
per_post = (
|
| 1036 |
best_base * best_reach * best_niche
|
| 1037 |
-
* avg_heatmap_peak * trending_bonus * tag_boost
|
| 1038 |
)
|
| 1039 |
return per_post * total_posts
|
| 1040 |
|
|
|
|
| 27 |
from ..models import (
|
| 28 |
CollabProposal,
|
| 29 |
EngagementSignals,
|
| 30 |
+
HeadlineMetrics,
|
| 31 |
+
JudgeReport,
|
| 32 |
ReplyAction,
|
| 33 |
ScheduledAction,
|
| 34 |
ToolCall,
|
|
|
|
| 40 |
from models import (
|
| 41 |
CollabProposal,
|
| 42 |
EngagementSignals,
|
| 43 |
+
HeadlineMetrics,
|
| 44 |
+
JudgeReport,
|
| 45 |
ReplyAction,
|
| 46 |
ScheduledAction,
|
| 47 |
ToolCall,
|
|
|
|
| 160 |
|
| 161 |
SATURATION_PENALTY_K = 0.25
|
| 162 |
TREND_DEFAULT_HALFLIFE_HOURS = 60
|
| 163 |
+
# Collab reward shaping (Later 2023 reach study, HypeAuditor 2024 niche affinity, Rival IQ 2025 overlap patterns,
|
| 164 |
+
# Cen et al. 2024 disengagement model for diminishing returns instead of a hard cap).
|
| 165 |
+
COLLAB_REACH_K = 0.60 # cross-audience exposure: capped reach uplift when overlap is 0
|
| 166 |
+
COLLAB_AFFINITY_K = 0.30 # same-audience affinity: per-impression engagement uplift when overlap is 1
|
| 167 |
+
COLLAB_GROWTH_K = 1.50 # cross-pollination follower spillover, scales (1 - overlap)
|
| 168 |
+
COLLAB_PARTNER_REPEAT_PENALTY = 0.7 # discount on multipliers when partner reused this brand
|
| 169 |
+
COLLAB_FATIGUE_K = 0.3 # per-collab diminishing-returns factor: 1/(1+K*prior_collabs_this_episode)
|
| 170 |
+
|
| 171 |
REPLY_WINDOW_MINUTES = 90
|
| 172 |
REPLY_REACH_BONUS = 1.4
|
| 173 |
API_BUDGET_INITIAL = 100
|
| 174 |
|
| 175 |
+
# Heuristic baselines for headline metric `vs_baseline_pct`.
|
| 176 |
+
# Data-driven: loaded from `plots/training_summary.json["smart_heuristic"]` recorded by
|
| 177 |
+
# `training/run_training_evidence.py`. Falls back to conservative calibration constants
|
| 178 |
+
# if the file is missing (audit trail: see RESEARCH.md for the rule-based policy spec).
|
| 179 |
+
def _load_heuristic_baselines() -> Dict[str, float]:
|
| 180 |
+
summary = Path(__file__).parent.parent / "plots" / "training_summary.json"
|
| 181 |
+
try:
|
| 182 |
+
data = json.loads(summary.read_text())
|
| 183 |
+
empirical = data.get("smart_heuristic") or {}
|
| 184 |
+
return {k: float(v) for k, v in empirical.items() if k in VALID_TASKS}
|
| 185 |
+
except Exception:
|
| 186 |
+
return {}
|
| 187 |
+
|
| 188 |
+
HEURISTIC_BASELINE_SCORES: Dict[str, float] = _load_heuristic_baselines() or {
|
| 189 |
+
"monthly_engage": 0.43,
|
| 190 |
+
"monthly_strategic": 0.77,
|
| 191 |
+
"monthly_competitive": 0.81,
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
# Cross-episode store for distribution-shift retention. Keyed by episode_chain_id, stores
|
| 195 |
+
# {"baseline": score, "shifted": score} so the second run can compute retention_under_shift.
|
| 196 |
+
_SHIFT_HISTORY: Dict[str, Dict[str, float]] = {}
|
| 197 |
+
|
| 198 |
# Tool costs
|
| 199 |
TOOL_COSTS = {
|
| 200 |
"query_audience": 2,
|
|
|
|
| 265 |
"parameters": {},
|
| 266 |
},
|
| 267 |
"propose_collab": {
|
| 268 |
+
"description": "Propose a collab post with a competitor at a specific hour. The post you schedule at that hour will be co-authored with the partner.",
|
| 269 |
"parameters": {
|
| 270 |
"partner_id": {"type": "string"},
|
| 271 |
"content_type": {"type": "string", "enum": ["reel", "story", "carousel", "text_post"]},
|
|
|
|
| 314 |
self._api_budget = API_BUDGET_INITIAL
|
| 315 |
self._collabs_this_month = 0
|
| 316 |
self._collab_history: List[str] = []
|
| 317 |
+
self._active_collab: Optional[CollabProposal] = None
|
| 318 |
self._low_energy_days = 0
|
| 319 |
self._total_posts_this_week = 0
|
| 320 |
self._week_start_day = 0
|
| 321 |
self._daily_signals = EngagementSignals()
|
| 322 |
+
self._total_tool_calls = 0
|
| 323 |
+
self._total_action_chars = 0
|
| 324 |
+
self._shift_label: Optional[str] = None
|
| 325 |
+
self._chain_id: Optional[str] = None
|
| 326 |
|
| 327 |
self._trending_topics = self._pick_trending_topics()
|
| 328 |
self._trending_tags = self._pick_trending_tags()
|
|
|
|
| 507 |
|
| 508 |
return daily_fatigue * weekly_mult
|
| 509 |
|
| 510 |
+
# ----- collab multipliers (overlap-driven) -----
|
| 511 |
+
|
| 512 |
+
def _user_partner_overlap(self, partner_id: str) -> Optional[float]:
|
| 513 |
+
ids = _OVERLAP_DATA.get("archetype_ids", [])
|
| 514 |
+
if "user_creator" not in ids or partner_id not in ids:
|
| 515 |
+
return None
|
| 516 |
+
u = ids.index("user_creator")
|
| 517 |
+
p = ids.index(partner_id)
|
| 518 |
+
return _OVERLAP_DATA["matrix"][u][p]
|
| 519 |
+
|
| 520 |
+
def _collab_multipliers(self, partner_id: str) -> Tuple[float, float]:
|
| 521 |
+
"""Returns (engagement_multiplier, follower_growth_multiplier)."""
|
| 522 |
+
o = self._user_partner_overlap(partner_id)
|
| 523 |
+
if o is None:
|
| 524 |
+
return 1.0, 1.0
|
| 525 |
+
reach = 1.0 + (1.0 - o) * COLLAB_REACH_K
|
| 526 |
+
affinity = 1.0 + o * COLLAB_AFFINITY_K
|
| 527 |
+
growth = 1.0 + (1.0 - o) * COLLAB_GROWTH_K
|
| 528 |
+
eng_boost = reach * affinity
|
| 529 |
+
if partner_id in self._collab_history[:-1]:
|
| 530 |
+
eng_boost *= COLLAB_PARTNER_REPEAT_PENALTY
|
| 531 |
+
growth *= COLLAB_PARTNER_REPEAT_PENALTY
|
| 532 |
+
prior = max(0, self._collabs_this_month - 1)
|
| 533 |
+
fatigue = 1.0 / (1.0 + COLLAB_FATIGUE_K * prior)
|
| 534 |
+
return eng_boost * fatigue, growth * fatigue
|
| 535 |
+
|
| 536 |
# ----- engagement signals (Mosseri-aligned) -----
|
| 537 |
|
| 538 |
def _compute_engagement_signals(
|
|
|
|
| 621 |
elif tool.name == "query_creator_pool":
|
| 622 |
pool = []
|
| 623 |
for comp in self._competitors:
|
| 624 |
+
overlap = self._user_partner_overlap(comp.id)
|
| 625 |
+
pool.append({
|
| 626 |
+
"id": comp.id, "name": comp.name, "niche": comp.niche,
|
| 627 |
+
"audience_overlap": round(overlap, 2) if overlap is not None else None,
|
| 628 |
+
})
|
| 629 |
return ToolResult(name=tool.name, data=pool, budget_remaining=self._api_budget)
|
| 630 |
|
| 631 |
elif tool.name == "propose_collab":
|
|
|
|
|
|
|
| 632 |
partner_id = tool.arguments.get("partner_id", "")
|
| 633 |
+
if partner_id not in [c.id for c in self._competitors]:
|
| 634 |
+
return ToolResult(name=tool.name, success=False, error=f"unknown partner: {partner_id}", budget_remaining=self._api_budget)
|
| 635 |
return ToolResult(name=tool.name, data={"status": "proposal_accepted", "partner_id": partner_id}, budget_remaining=self._api_budget)
|
| 636 |
|
| 637 |
return ToolResult(name=tool.name, success=False, error=f"unknown tool: {tool.name}", budget_remaining=self._api_budget)
|
|
|
|
| 639 |
# ----- counterfactual coach -----
|
| 640 |
|
| 641 |
def _compute_coach_feedback(self, agent_engagement: float) -> Dict[str, Any]:
|
| 642 |
+
# World-modeling discipline: emit a SCALAR delta only (no optimal_hours leak).
|
| 643 |
+
# Agents must use `query_trends` / `predict_engagement` to discover *which* hours
|
| 644 |
+
# are optimal — coach only signals "you're above/below the heatmap optimum today".
|
| 645 |
dow = self._day % 7
|
| 646 |
row = _HEATMAP_GRID.get(dow, [1.0] * 24)
|
| 647 |
best_hours = sorted(range(24), key=lambda h: row[h] if h < len(row) else 0, reverse=True)[:2]
|
|
|
|
| 650 |
optimal_eng = sum(row[h] * best_base * best_reach for h in best_hours)
|
| 651 |
delta = agent_engagement - optimal_eng
|
| 652 |
return {
|
|
|
|
|
|
|
|
|
|
| 653 |
"delta": round(delta, 4),
|
| 654 |
+
"suggestion": (
|
| 655 |
+
"Above heatmap optimum today."
|
| 656 |
+
if delta >= 0
|
| 657 |
+
else "Below heatmap optimum — try `query_trends` / `predict_engagement` to find peak hours."
|
| 658 |
+
),
|
| 659 |
}
|
| 660 |
|
| 661 |
+
# ----- regulator / judge mode (deterministic, explainable) -----
|
| 662 |
+
|
| 663 |
+
def _compute_judge_report(
|
| 664 |
+
self,
|
| 665 |
+
action: ViraltestAction,
|
| 666 |
+
daily_engagement: float,
|
| 667 |
+
daily_posts: int,
|
| 668 |
+
energy_min: float,
|
| 669 |
+
errors: List[str],
|
| 670 |
+
) -> JudgeReport:
|
| 671 |
+
violations: List[str] = []
|
| 672 |
+
|
| 673 |
+
pc = 1.0
|
| 674 |
+
if daily_posts > 5:
|
| 675 |
+
violations.append(f"posts_today={daily_posts} exceeds tier-4 fatigue cliff (Buffer 2.1M)")
|
| 676 |
+
pc -= 0.30
|
| 677 |
+
elif daily_posts > 2:
|
| 678 |
+
violations.append(f"posts_today={daily_posts} enters fatigue tier (>2/day)")
|
| 679 |
+
pc -= 0.10
|
| 680 |
+
if self._total_posts_this_week > WEEKLY_FATIGUE_THRESHOLD:
|
| 681 |
+
violations.append(f"weekly posts={self._total_posts_this_week} > {WEEKLY_FATIGUE_THRESHOLD} (Buffer 2.1M cap)")
|
| 682 |
+
pc -= 0.20
|
| 683 |
+
if self._collabs_this_month >= 4:
|
| 684 |
+
violations.append(f"collab cadence={self._collabs_this_month} net-negative beyond 3 (Cen 2024)")
|
| 685 |
+
pc -= 0.20
|
| 686 |
+
if errors:
|
| 687 |
+
violations.append(f"plan_errors={len(errors)}")
|
| 688 |
+
pc -= 0.05 * len(errors)
|
| 689 |
+
if self._hours_since_sleep > 22:
|
| 690 |
+
violations.append(f"sleep_debt: {self._hours_since_sleep}h awake (Van Dongen 2003)")
|
| 691 |
+
pc -= 0.10
|
| 692 |
+
|
| 693 |
+
burnout_pressure = (1.0 - energy_min) * 0.4 + self._sleep_debt * 0.3 + (self._low_energy_days / 5.0) * 0.3
|
| 694 |
+
sustainability_risk = max(0.0, min(1.0, burnout_pressure))
|
| 695 |
+
|
| 696 |
+
intents_used = {sa.intent for sa in action.scheduled_actions if sa.intent}
|
| 697 |
+
formats_used = {sa.content_type for sa in action.scheduled_actions if sa.action_type == "post" and sa.content_type}
|
| 698 |
+
eng_per_post = daily_engagement / max(1, daily_posts)
|
| 699 |
+
sq = (
|
| 700 |
+
0.40 * min(1.0, eng_per_post / 1.2)
|
| 701 |
+
+ 0.30 * min(1.0, len(intents_used) / 2.0)
|
| 702 |
+
+ 0.30 * min(1.0, len(formats_used) / 2.0)
|
| 703 |
+
)
|
| 704 |
+
|
| 705 |
+
explanation = (
|
| 706 |
+
f"compliance={max(0.0, pc):.2f} risk={sustainability_risk:.2f} strategy={sq:.2f} | "
|
| 707 |
+
+ (("violations: " + "; ".join(violations)) if violations else "no policy violations")
|
| 708 |
+
)
|
| 709 |
+
|
| 710 |
+
return JudgeReport(
|
| 711 |
+
policy_compliance=max(0.0, min(1.0, pc)),
|
| 712 |
+
sustainability_risk=sustainability_risk,
|
| 713 |
+
strategic_quality=max(0.0, min(1.0, sq)),
|
| 714 |
+
explanation=explanation,
|
| 715 |
+
violations=violations,
|
| 716 |
+
)
|
| 717 |
+
|
| 718 |
+
def _compute_headline_metrics(self, grader_score: float) -> HeadlineMetrics:
|
| 719 |
+
baseline = HEURISTIC_BASELINE_SCORES.get(self._task, 0.30)
|
| 720 |
+
vs_pct = (grader_score - baseline) / baseline if baseline > 0 else 0.0
|
| 721 |
+
spt = grader_score / max(1, self._total_tool_calls)
|
| 722 |
+
sp1k = grader_score / max(1.0, self._total_action_chars / 1000.0)
|
| 723 |
+
|
| 724 |
+
retention: Optional[float] = None
|
| 725 |
+
if self._chain_id:
|
| 726 |
+
entry = _SHIFT_HISTORY.setdefault(self._chain_id, {})
|
| 727 |
+
label = self._shift_label or "baseline"
|
| 728 |
+
entry[label] = grader_score
|
| 729 |
+
base = entry.get("baseline")
|
| 730 |
+
shifted = entry.get("shifted")
|
| 731 |
+
if base is not None and shifted is not None and base > 0:
|
| 732 |
+
retention = shifted / base
|
| 733 |
+
|
| 734 |
+
return HeadlineMetrics(
|
| 735 |
+
vs_baseline_pct=round(vs_pct, 4),
|
| 736 |
+
score_per_tool_call=round(spt, 4),
|
| 737 |
+
score_per_1k_chars=round(sp1k, 4),
|
| 738 |
+
retention_under_shift=round(retention, 4) if retention is not None else None,
|
| 739 |
+
heuristic_baseline_score=round(baseline, 4),
|
| 740 |
+
agent_score=round(grader_score, 4),
|
| 741 |
+
total_tool_calls=self._total_tool_calls,
|
| 742 |
+
total_action_chars=self._total_action_chars,
|
| 743 |
+
)
|
| 744 |
+
|
| 745 |
# ----- core API -----
|
| 746 |
|
| 747 |
def reset(self, seed: Optional[int] = None, episode_id: Optional[str] = None, **kwargs: Any) -> ViraltestObservation:
|
|
|
|
| 753 |
self._state = State(episode_id=episode_id or str(uuid4()), step_count=0)
|
| 754 |
self._init_state()
|
| 755 |
|
| 756 |
+
self._shift_label = kwargs.get("shift_label")
|
| 757 |
+
self._chain_id = kwargs.get("episode_chain_id")
|
| 758 |
+
|
| 759 |
chain_id = kwargs.get("episode_chain_id")
|
| 760 |
if chain_id and chain_id in _BRAND_STORE:
|
| 761 |
brand = _BRAND_STORE[chain_id]
|
|
|
|
| 777 |
if action.notes:
|
| 778 |
self._agent_notes = action.notes
|
| 779 |
|
| 780 |
+
try:
|
| 781 |
+
self._total_action_chars += len(action.model_dump_json())
|
| 782 |
+
except Exception:
|
| 783 |
+
pass
|
| 784 |
+
|
| 785 |
tool_results: List[ToolResult] = []
|
| 786 |
for tc in action.tool_calls:
|
| 787 |
result = self._dispatch_tool(tc)
|
| 788 |
tool_results.append(result)
|
| 789 |
+
if result.success:
|
| 790 |
+
self._total_tool_calls += 1
|
| 791 |
|
| 792 |
+
# Process collab proposal (no hard cap; diminishing returns enforced via _collab_multipliers)
|
| 793 |
+
self._active_collab = None
|
| 794 |
+
if action.collab:
|
| 795 |
self._collabs_this_month += 1
|
| 796 |
self._collab_history.append(action.collab.partner_id)
|
| 797 |
+
self._active_collab = action.collab
|
| 798 |
|
| 799 |
# Validate scheduled actions
|
| 800 |
schedule: Dict[int, ScheduledAction] = {}
|
|
|
|
| 880 |
|
| 881 |
done = self._state.step_count >= TASK_HORIZON or self._energy <= 0.0
|
| 882 |
coach = self._compute_coach_feedback(daily_engagement)
|
| 883 |
+
judge = self._compute_judge_report(action, daily_engagement, daily_posts, energy_min, errors)
|
| 884 |
|
| 885 |
if done:
|
| 886 |
self._episode_done = True
|
| 887 |
grader_score = self._run_grader()
|
| 888 |
+
headline = self._compute_headline_metrics(grader_score)
|
| 889 |
|
| 890 |
chain_id = kwargs.get("episode_chain_id")
|
| 891 |
if chain_id:
|
|
|
|
| 902 |
grader_score=grader_score, daily_total_engagement=daily_engagement,
|
| 903 |
daily_posts_made=daily_posts, daily_energy_min=energy_min,
|
| 904 |
tool_results=tool_results, engagement_signals=daily_signals,
|
| 905 |
+
coach_feedback=coach, judge_report=judge, headline_metrics=headline,
|
| 906 |
)
|
| 907 |
return self._final_observation
|
| 908 |
|
|
|
|
| 911 |
daily_total_engagement=daily_engagement,
|
| 912 |
daily_posts_made=daily_posts, daily_energy_min=energy_min,
|
| 913 |
tool_results=tool_results, engagement_signals=daily_signals,
|
| 914 |
+
coach_feedback=coach, judge_report=judge,
|
| 915 |
)
|
| 916 |
|
| 917 |
def _process_hour_action(self, sa: ScheduledAction) -> Tuple[float, float, Optional[EngagementSignals]]:
|
| 918 |
engagement = 0.0
|
| 919 |
signals = None
|
| 920 |
|
| 921 |
+
collab_growth_mult = 1.0
|
| 922 |
+
|
| 923 |
if sa.action_type == "post":
|
| 924 |
cost = CONTENT_ENERGY_COST.get(sa.content_type, 0.1)
|
| 925 |
if self._content_queue > 0:
|
|
|
|
| 956 |
* trending_bonus * comp_diff * fatigue * algo_mult
|
| 957 |
* niche_mult * saturation_factor
|
| 958 |
)
|
| 959 |
+
|
| 960 |
+
if self._active_collab is not None and self._active_collab.hour == sa.hour:
|
| 961 |
+
eng_m, growth_m = self._collab_multipliers(self._active_collab.partner_id)
|
| 962 |
+
engagement *= eng_m
|
| 963 |
+
collab_growth_mult = growth_m
|
| 964 |
+
|
| 965 |
engagement = min(engagement, 5.0)
|
| 966 |
|
| 967 |
signals = self._compute_engagement_signals(sa.content_type, engagement, sa.intent)
|
|
|
|
| 991 |
self._time_since_last_post = 0
|
| 992 |
|
| 993 |
if engagement > 0:
|
| 994 |
+
self._followers += int(engagement * 100 * collab_growth_mult)
|
| 995 |
|
| 996 |
elif sa.action_type == "create_content":
|
| 997 |
self._energy = max(0.0, self._energy - CREATE_CONTENT_COST)
|
|
|
|
| 1127 |
tool_results: Optional[List[ToolResult]] = None,
|
| 1128 |
engagement_signals: Optional[EngagementSignals] = None,
|
| 1129 |
coach_feedback: Optional[Dict[str, Any]] = None,
|
| 1130 |
+
judge_report: Optional[JudgeReport] = None,
|
| 1131 |
+
headline_metrics: Optional[HeadlineMetrics] = None,
|
| 1132 |
) -> ViraltestObservation:
|
| 1133 |
recent_eng = self._engagement_history[-10:] if self._engagement_history else []
|
| 1134 |
eng_rate = sum(recent_eng) / len(recent_eng) if recent_eng else 0.0
|
|
|
|
| 1158 |
daily_energy_min=round(daily_energy_min, 3),
|
| 1159 |
engagement_signals=engagement_signals,
|
| 1160 |
coach_feedback=coach_feedback,
|
| 1161 |
+
judge_report=judge_report,
|
| 1162 |
+
headline_metrics=headline_metrics,
|
| 1163 |
tool_results=tool_results or [],
|
| 1164 |
agent_notes=self._agent_notes,
|
| 1165 |
api_budget_remaining=self._api_budget,
|
|
|
|
| 1182 |
return 0.0
|
| 1183 |
|
| 1184 |
def _theoretical_max_engagement(self) -> float:
|
| 1185 |
+
# Buffer 2.1M (RESEARCH.md): 3–5 posts/week doubles follower growth vs 1–2,
|
| 1186 |
+
# diminishing returns above 5/week, 20–35% engagement drop per post above 7/week.
|
| 1187 |
+
# Cap at 5 posts/week × 4 weeks = 20 posts/month (sweet-spot, no fatigue penalty).
|
| 1188 |
best_base = max(BASE_ENGAGEMENT.values())
|
| 1189 |
best_reach = max(REACH_MULT.values())
|
| 1190 |
best_niche = max(_NICHE_MULTIPLIERS.values()) if _NICHE_MULTIPLIERS else 1.0
|
| 1191 |
|
| 1192 |
+
posts_per_week = 5
|
| 1193 |
+
weeks_in_horizon = TASK_HORIZON / 7.0
|
| 1194 |
+
total_posts = int(round(posts_per_week * weeks_in_horizon))
|
| 1195 |
|
| 1196 |
avg_heatmap_peak = 1.0
|
| 1197 |
if _HEATMAP_GRID:
|
| 1198 |
+
day_peaks = [
|
| 1199 |
+
max(row) if row else 1.0
|
| 1200 |
+
for row in _HEATMAP_GRID.values()
|
| 1201 |
+
]
|
| 1202 |
avg_heatmap_peak = sum(day_peaks) / len(day_peaks) if day_peaks else 1.0
|
| 1203 |
|
| 1204 |
+
# Trending + tag uplifts: tier-1 industry data shows ~1.2-1.3x for trending topics
|
| 1205 |
+
# and ~1.05-1.15x for high-performance tags. Mid-range used to avoid headroom inflation.
|
| 1206 |
trending_bonus = 1.25
|
| 1207 |
tag_boost = 1.1
|
| 1208 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1209 |
per_post = (
|
| 1210 |
best_base * best_reach * best_niche
|
| 1211 |
+
* avg_heatmap_peak * trending_bonus * tag_boost
|
| 1212 |
)
|
| 1213 |
return per_post * total_posts
|
| 1214 |
|