train-new / blog /slide_outline.md
anuragredbus's picture
Viraltest env snapshot for HF Space (single root commit; plots as normal files, no LFS).
0813516

Viraltest v2 — Pitch Deck Outline (8 slides)

Slide 1: Title

  • Viraltest v2: Teaching LLMs World Modeling Through Instagram Strategy
  • Theme #3.1 — Professional Tasks
  • OpenEnv Hackathon India 2026
  • Team: [your team name]

Slide 2: The Problem

  • $250B creator economy, 67M creators (Goldman Sachs 2025)
  • 73% experience burnout; Instagram drives 88% of it (Awin 2024)
  • Algorithm changes constantly — no one tells you the rules
  • Existing tools show analytics but don't teach strategy
  • Gap: No RL environment captures this tradeoff with realistic dynamics

Slide 3: The World

  • 30-day Instagram simulation (monthly cycle)
  • Mosseri-aligned signals: watch_time, sends, saves, likes (official Jan 2025)
  • Hour-by-hour heatmap (Buffer 9.6M + Sprout 2B)
  • 7 competitor archetypes, 5 audience segments, ~120 tags
  • Piecewise-linear sleep model (Van Dongen 2003, Sleep)
  • Tiered audience fatigue (Buffer 2.1M)

Slide 4: The Tools (Theme #3.1 Fit)

  • Agent starts with SPARSE observation (energy, followers, reward)
  • 8 discoverable tools: query_trends, query_competitor, query_audience, query_tag_history, predict_engagement, draft_review, query_creator_pool, propose_collab
  • API budget (100/episode) — can't query everything, must prioritize
  • Notes field for hypothesis tracking across days
  • Counterfactual coach: "here's what would have happened with optimal timing"

Slide 5: Training Pipeline

  • TRL GRPO on Qwen2.5-1.5B-Instruct (free Colab T4)
  • Reward: per-step env reward + 2× terminal grader score
  • 200 episodes, batch 4, 50 GRPO steps
  • 3 tasks: monthly_engage → monthly_strategic → monthly_competitive
  • Multi-episode chain: brand state persists across months

Slide 6: Results

  • [Embed reward_curve.png — ascending curve over training]
  • [Embed before_after.png — smart baseline vs trained agent per task]
  • Trained agent: uses tools on day 1, adapts strategy by day 5, manages energy throughout
  • Score improvement on monthly_competitive: [X% → Y%]

Slide 7: Sources & Verifiability

  • 4-tier source quality bar (peer-reviewed → industry → official → survey)
  • 7 Tier-1 papers, 9 Tier-2 studies, 1 Tier-3 official statement
  • Every constant has a DOI/PMID/arXiv ID
  • Tier-5 SEO blogs explicitly rejected (13 sites listed with rationale)
  • Full bibliography: RESEARCH.md (~6 pages)
  • Any number in this presentation can be debated — we welcome it

Slide 8: Try It

  • HF Space: [link]
  • GitHub: [link]
  • Training notebook: [Colab link]
  • Blog: [HF post link]
  • Video: [YouTube link]
  • Questions?