replicalab / docs /map /agents.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84

Agents Map β€” replicalab/agents/

Deterministic policy helpers for Scientist and Lab Manager agents. No LLM calls in this module β€” the LLM backend is injected via GenerateFn.

Tasks implemented: AGT 01-07, 11

Exports β€” __init__.py

# From lab_manager_policy
AlternativeSuggestion, FeasibilityCheckResult, SuggestionChange
check_feasibility, compose_lab_manager_response, suggest_alternative

# From scientist_policy
RetryMetadata, ScientistCallResult, ScientistOutputParseError
build_baseline_scientist_action, build_scientist_system_prompt
call_scientist_with_retry, format_scientist_observation, parse_scientist_output

Scientist Policy β€” scientist_policy.py

Pipeline Flow

scenario β†’ build_scientist_system_prompt() β†’ system_prompt
                                                    ↓
observation β†’ format_scientist_observation() β†’ user_message
                                                    ↓
              call_scientist_with_retry(generate_fn, system_prompt, obs)
                   ↓ calls generate_fn(messages)
                   ↓ calls parse_scientist_output(raw_text)
                   ↓ on failure: _build_correction_prompt(error)
                   ↓ retries up to max_retries times
                   β†’ ScientistCallResult(action, metadata)

Public Functions

build_scientist_system_prompt(scenario) -> str β€” AGT 01

Builds a domain-neutral system prompt from a NormalizedScenarioPack.

Sections rendered (in order):

  1. Role statement ("You are the Scientist agent in ReplicaLab")
  2. Job description (negotiate strongest feasible plan)
  3. Domain ID
  4. Task summary
  5. Success criteria (bulleted)
  6. Constraints (with hard/soft labels, quantities, comparators)
  7. Available resources (with availability status)
  8. Allowed substitutions (original β†’ alternative with conditions)
  9. Output contract (exactly one JSON, no extra keys)
  10. Allowed action_type values
  11. Action-specific field requirements

format_scientist_observation(obs: ScientistObservation) -> str β€” AGT 02

Converts a per-turn observation into the user message string.

Sections (fixed order, tested):

  1. Round status: "Round {n} of {max}"
  2. Paper summary: title, hypothesis, method, key finding, goal
  3. Conversation history or "No conversation history yet"
  4. Current protocol or "No protocol has been proposed yet"
  5. ScientistAction schema reminder (field list, action_type values)
  6. Closing instruction: "Respond with exactly one JSON object"

parse_scientist_output(raw_text: str) -> ScientistAction β€” MOD 09

Strict parser from raw model text into validated ScientistAction.

Accepts:

  • Plain JSON objects
  • \``json` fenced blocks
  • Prose containing one JSON object

Error codes:

Code Meaning
no_json No JSON object found in output
invalid_json JSON syntax error (trailing comma, etc.)
invalid_action Valid JSON but fails ScientistAction validation

call_scientist_with_retry(generate_fn, system_prompt, observation, max_retries=2) -> ScientistCallResult β€” AGT 03

Retry loop with error-specific correction prompts.

Behavior:

  1. Builds messages: [system, user]
  2. Calls generate_fn(messages) β†’ raw text
  3. Calls parse_scientist_output(raw_text)
  4. On success: returns ScientistCallResult(action, metadata)
  5. On failure: appends [assistant(bad_output), user(correction)] to messages, retries
  6. After max_retries failures: raises last ScientistOutputParseError

Correction prompts (_build_correction_prompt):

  • no_json: "Your previous response did not contain a JSON object..."
  • invalid_json: "Your previous response contained malformed JSON: {error}..."
  • invalid_action: "...failed ScientistAction validation: {detail}. Fix the validation error..."

build_baseline_scientist_action(observation) -> ScientistAction β€” AGT 04

Deterministic non-LLM action for smoke tests. No API calls.

Decision tree:

  1. If protocol exists AND at max rounds β†’ accept
  2. If protocol exists AND latest lab_manager feedback indicates blocker β†’ revise_protocol (halve sample, reduce duration)
  3. If protocol exists AND no blocker β†’ accept
  4. If no protocol β†’ propose_protocol (domain-inferred defaults)

Domain inference (_infer_domain):

  • Checks paper fields for ML hints (benchmark, dataset, gpu, bert...) β†’ machine_learning
  • Checks for finance hints (backtest, sharpe, trading...) β†’ finance_trading
  • Default β†’ mathematics

Blocker detection (_feedback_indicates_blocker):

  • Returns False if action_type is accept or report_feasibility
  • Otherwise checks message for blocker hints: booked, unavailable, exceeds, tight, budget, cost, etc.

Classes

ScientistOutputParseError(ValueError)

Attribute Type Purpose
code Literal["no_json", "invalid_json", "invalid_action"] Machine-readable error type
message str Human-readable detail
raw_text str Original model output
parsed_payload dict | None Decoded JSON if parsing succeeded

RetryMetadata(BaseModel) β€” extra="forbid"

Field Type Purpose
attempt_count int Total attempts (1 = success on first try)
retry_count int attempt_count - 1
last_error_code str | None Error code from last failure
last_error_message str | None Error message from last failure

ScientistCallResult(BaseModel) β€” extra="forbid"

Field Type
action ScientistAction
metadata RetryMetadata

Type Aliases

GenerateFn = Callable[[list[dict[str, str]]], str]

Constants

_ML_HINTS = ("benchmark", "dataset", "accuracy", "tokenizer", "train", "gpu", ...)
_FINANCE_HINTS = ("backtest", "drawdown", "sharpe", "trading", "slippage", ...)
_BLOCKER_HINTS = ("booked", "unavailable", "exceeds", "tight", "budget", "cost", ...)

Lab Manager Policy β€” lab_manager_policy.py

Pipeline Flow

protocol + scenario β†’ check_feasibility()
                           ↓
                    FeasibilityCheckResult (7 dimensions)
                           ↓
              suggest_alternative(protocol, check, scenario)
                           ↓
              AlternativeSuggestion | None
                           ↓
              compose_lab_manager_response(check, suggestion)
                           ↓
                    LabManagerAction (typed, with explanation)

Public Functions

check_feasibility(protocol, scenario) -> FeasibilityCheckResult β€” AGT 05

Runs 7 deterministic dimension checks. No LLM calls.

Checks performed:

Dimension Function What it checks
protocol _build_protocol_check Wraps validate_protocol() from MOD 05
budget _check_budget _estimate_protocol_cost() vs budget_remaining
equipment _check_equipment Items available/booked, finds substitutions
reagents _check_reagents Items in-stock/out-of-stock, finds substitutions
schedule _check_schedule duration_days vs time_limit_days
staff _check_staff _estimate_staff_load() vs staff_count
policy _check_policy Safety restrictions (e.g., offline-only execution)

Cost estimation (_estimate_protocol_cost):

base = sample_size * 10
+ duration_days * 50
+ len(controls) * 25
+ len(required_equipment) * 100
+ len(required_reagents) * 75

Staff estimation (_estimate_staff_load):

base = 1
+ (1 if sample_size > 20)
+ (1 if len(controls) > 2)
+ (1 if duration_days > 5)
+ (1 if len(required_equipment) > 2)

suggest_alternative(protocol, check_result, scenario) -> AlternativeSuggestion | None β€” AGT 06

Deterministic revision engine. Returns None if already feasible.

Fix order (deterministic):

  1. Equipment substitutions β€” replace booked items with alternatives
  2. Reagent substitutions β€” replace out-of-stock items with alternatives
  3. Duration clamp β€” reduce to time_limit_days if over
  4. Sample size reduction β€” iterative halving until budget fits (max 10 iterations)

Post-fix recheck: runs check_feasibility() on revised protocol. Returns: revised protocol, list of changes, remaining failures, pre/post checks.

compose_lab_manager_response(check_result, suggestion=None, explanation_renderer=None) -> LabManagerAction β€” AGT 07

Converts grounded results into a typed LabManagerAction.

Action type selection (_select_lab_manager_action_type):

Condition Action
All 7 dimensions pass ACCEPT
Suggestion exists AND improved AND only non-lab failures remain SUGGEST_ALTERNATIVE
Lab constraints fail AND no suggestion REJECT
Only policy/protocol fail (not lab constraints) REPORT_FEASIBILITY
Suggestion exists but didn't improve REJECT

Lab constraints = budget, equipment, reagents, schedule, staff (not protocol, not policy).

Classes

DimensionCheck(BaseModel) β€” extra="forbid"

Field Type Default
ok bool True
reasons list[str] []

FeasibilityCheckResult(BaseModel) β€” extra="forbid"

Field Type
protocol DimensionCheck
budget DimensionCheck
equipment DimensionCheck
reagents DimensionCheck
schedule DimensionCheck
staff DimensionCheck
policy DimensionCheck
estimated_cost float
required_staff int
substitution_options dict[str, list[str]]
validation_result ValidationResult

Computed properties: protocol_ok, budget_ok, equipment_ok, reagents_ok, schedule_ok, staff_ok, feasible, summary

SuggestionChange(BaseModel) β€” extra="forbid"

Field Type Purpose
field str Which protocol field was changed
original str Original value (stringified)
revised str New value (stringified)
reason str Why it was changed
tradeoff str What is lost

AlternativeSuggestion(BaseModel) β€” extra="forbid"

Field Type
revised_protocol Protocol
applied_changes list[SuggestionChange]
remaining_failures list[str]
improved bool
pre_check FeasibilityCheckResult
post_check FeasibilityCheckResult

Type Aliases

ExplanationRenderer = Callable[
    [LabManagerActionType, FeasibilityCheckResult, Optional[AlternativeSuggestion]],
    str,
]