Spaces:

ycwhencpp
/

train-new

Paused

App Files Files Community

train-new / blog /slide_outline.md

anuragredbus's picture

Viraltest env snapshot for HF Space (single root commit; plots as normal files, no LFS).

0813516 13 days ago

|

history blame contribute delete

2.57 kB

Viraltest v2 — Pitch Deck Outline (8 slides)

Slide 1: Title

Viraltest v2: Teaching LLMs World Modeling Through Instagram Strategy
Theme #3.1 — Professional Tasks
OpenEnv Hackathon India 2026
Team: [your team name]

Slide 2: The Problem

$250B creator economy, 67M creators (Goldman Sachs 2025)
73% experience burnout; Instagram drives 88% of it (Awin 2024)
Algorithm changes constantly — no one tells you the rules
Existing tools show analytics but don't teach strategy
Gap: No RL environment captures this tradeoff with realistic dynamics

Slide 3: The World

30-day Instagram simulation (monthly cycle)
Mosseri-aligned signals: watch_time, sends, saves, likes (official Jan 2025)
Hour-by-hour heatmap (Buffer 9.6M + Sprout 2B)
7 competitor archetypes, 5 audience segments, ~120 tags
Piecewise-linear sleep model (Van Dongen 2003, Sleep)
Tiered audience fatigue (Buffer 2.1M)

Slide 4: The Tools (Theme #3.1 Fit)

Agent starts with SPARSE observation (energy, followers, reward)
8 discoverable tools: query_trends, query_competitor, query_audience, query_tag_history, predict_engagement, draft_review, query_creator_pool, propose_collab
API budget (100/episode) — can't query everything, must prioritize
Notes field for hypothesis tracking across days
Counterfactual coach: "here's what would have happened with optimal timing"

Slide 5: Training Pipeline

TRL GRPO on Qwen2.5-1.5B-Instruct (free Colab T4)
Reward: per-step env reward + 2× terminal grader score
200 episodes, batch 4, 50 GRPO steps
3 tasks: monthly_engage → monthly_strategic → monthly_competitive
Multi-episode chain: brand state persists across months

Slide 6: Results

[Embed reward_curve.png — ascending curve over training]
[Embed before_after.png — smart baseline vs trained agent per task]
Trained agent: uses tools on day 1, adapts strategy by day 5, manages energy throughout
Score improvement on monthly_competitive: [X% → Y%]

Slide 7: Sources & Verifiability

4-tier source quality bar (peer-reviewed → industry → official → survey)
7 Tier-1 papers, 9 Tier-2 studies, 1 Tier-3 official statement
Every constant has a DOI/PMID/arXiv ID
Tier-5 SEO blogs explicitly rejected (13 sites listed with rationale)
Full bibliography: RESEARCH.md (~6 pages)
Any number in this presentation can be debated — we welcome it

Slide 8: Try It

HF Space: [link]
GitHub: [link]
Training notebook: [Colab link]
Blog: [HF post link]
Video: [YouTube link]
Questions?