Instructions to use poolside-laguna-hackathon/trade-pool with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/trade-pool with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
TradePool β a self-improving trading coding-agent (Laguna XS.2 LoRA)
Poolside Γ Prime Intellect Research Hackathon β Foundations track.
A LoRA adapter for poolside/Laguna-XS.2, trained with reinforcement learning so the
model becomes a coding agent that writes causal crypto trading-strategy functions,
scored by a leak-proof out-of-sample backtest.
The idea in one line
Trading discipline that normally lives as prompt text (a memory file of rules) is turned into adapter weights by rewarding disciplined, profitable behaviour on held-out market data. The verifier is the backtest.
How it works
- Environment (
verifiers, v0SingleTurnEnv, pushed tostimulir/trade-pool): the agent is given a Base-chain token's in-sample price history + a library of causal indicators (RSI, MACD, MAs, z-score, Bollinger, volatility) and must writedef strategy(features, position) -> target_position. - Verifier / reward β the strategy runs bar-by-bar over a held-out window
(lookahead is structurally impossible; the function never sees future bars), scored by
a weighted rubric:
- OOS Sharpe (0.40) Β· beats buy-and-hold (0.20) Β· drawdown control (0.15) Β· sane exposure (0.10) Β· transaction cost (0.05) Β· valid+actually-trades (0.10)
- Hard gates β reward 0: invalid code, lookahead, NaN equity, do-nothing strategies.
- Training β Prime Hosted RL (GRPO),
poolside/Laguna-XS.2, 50 steps, batch 128,rollouts_per_example=8,enable_thinking=false. FREE hosted Laguna run.
Results
RL produced a clean, monotonic reward climb on the training environment:
| Stage | Total reward |
|---|---|
| step ~0 (baseline) | ~0.15 |
| step ~8 | 0.19 |
| step ~11 | 0.28 |
| step ~13 (peak) | ~0.42 |
| step ~50 (final) | ~0.34β0.41 |
Every rubric component improved together (not single-metric gaming):
reward_valid 0.30 β ~0.70 (writes valid trading code far more often),
reward_sharpe 0.10 β 0.33, drawdown/exposure/cost all up. Held-out-symbol eval on base
Laguna scored reward_valid 0.75 / reward_sharpe 0.45, confirming the env is in the
healthy trainable band before training.
The novel contribution: closing the self-improvement loop
- Weights channel: each RL iteration warm-starts from the prior adapter
(
checkpoint_id) β genuine parametric continuation. - Curriculum channel: a reflection step reads the prior adapter's out-of-sample eval and shifts the next run's objective (sharpe β min-drawdown β balanced) and focuses the weakest symbols β the agent's own results drive its next curriculum.
- Falsifiable proof ("memory is the adapter"): the discipline block (distilled from
618 real prior trading decisions) can be stripped from the prompt
(
use_seed_principles=false); if the trained adapter stays disciplined, the rules now live in the weights, not the prompt.
Files
trade_pool/β the fullverifiersenvironment (features, causal backtester, executor, rubric, data) β installable, builds to a wheel, bundles its own OHLCV tape.adapter/β the trained LoRA adapter weights forpoolside/Laguna-XS.2.configs/β the RL training config(s).reward_curve.txt,eval_*.jsonβ training + eval metrics.
Reproduce
prime env push --path ./trade_pool --visibility PRIVATE # -> <you>/trade-pool
prime eval run <you>/trade-pool -m poolside/laguna-xs.2 -n 8 -r 1
prime train run configs/iter_1.toml # FREE hosted Laguna RL
prime deployments create <adapter_id> # serve the adapter
Built at the Poolside London hackathon, 29β30 May 2026. Team: TradePool (Tosin Dairo).
- Downloads last month
- -
Model tree for poolside-laguna-hackathon/trade-pool
Base model
poolside/Laguna-XS.2