Spaces:
Sleeping
Agents Map β replicalab/agents/
Deterministic policy helpers for Scientist and Lab Manager agents. No LLM calls in this module β the LLM backend is injected via
GenerateFn.Tasks implemented: AGT 01-07, 11
Exports β __init__.py
# From lab_manager_policy
AlternativeSuggestion, FeasibilityCheckResult, SuggestionChange
check_feasibility, compose_lab_manager_response, suggest_alternative
# From scientist_policy
RetryMetadata, ScientistCallResult, ScientistOutputParseError
build_baseline_scientist_action, build_scientist_system_prompt
call_scientist_with_retry, format_scientist_observation, parse_scientist_output
Scientist Policy β scientist_policy.py
Pipeline Flow
scenario β build_scientist_system_prompt() β system_prompt
β
observation β format_scientist_observation() β user_message
β
call_scientist_with_retry(generate_fn, system_prompt, obs)
β calls generate_fn(messages)
β calls parse_scientist_output(raw_text)
β on failure: _build_correction_prompt(error)
β retries up to max_retries times
β ScientistCallResult(action, metadata)
Public Functions
build_scientist_system_prompt(scenario) -> str β AGT 01
Builds a domain-neutral system prompt from a NormalizedScenarioPack.
Sections rendered (in order):
- Role statement ("You are the Scientist agent in ReplicaLab")
- Job description (negotiate strongest feasible plan)
- Domain ID
- Task summary
- Success criteria (bulleted)
- Constraints (with hard/soft labels, quantities, comparators)
- Available resources (with availability status)
- Allowed substitutions (original β alternative with conditions)
- Output contract (exactly one JSON, no extra keys)
- Allowed action_type values
- Action-specific field requirements
format_scientist_observation(obs: ScientistObservation) -> str β AGT 02
Converts a per-turn observation into the user message string.
Sections (fixed order, tested):
- Round status:
"Round {n} of {max}" - Paper summary: title, hypothesis, method, key finding, goal
- Conversation history or "No conversation history yet"
- Current protocol or "No protocol has been proposed yet"
- ScientistAction schema reminder (field list, action_type values)
- Closing instruction: "Respond with exactly one JSON object"
parse_scientist_output(raw_text: str) -> ScientistAction β MOD 09
Strict parser from raw model text into validated ScientistAction.
Accepts:
- Plain JSON objects
\``json` fenced blocks- Prose containing one JSON object
Error codes:
| Code | Meaning |
|---|---|
no_json |
No JSON object found in output |
invalid_json |
JSON syntax error (trailing comma, etc.) |
invalid_action |
Valid JSON but fails ScientistAction validation |
call_scientist_with_retry(generate_fn, system_prompt, observation, max_retries=2) -> ScientistCallResult β AGT 03
Retry loop with error-specific correction prompts.
Behavior:
- Builds messages:
[system, user] - Calls
generate_fn(messages)β raw text - Calls
parse_scientist_output(raw_text) - On success: returns
ScientistCallResult(action, metadata) - On failure: appends
[assistant(bad_output), user(correction)]to messages, retries - After
max_retriesfailures: raises lastScientistOutputParseError
Correction prompts (_build_correction_prompt):
no_json: "Your previous response did not contain a JSON object..."invalid_json: "Your previous response contained malformed JSON: {error}..."invalid_action: "...failed ScientistAction validation: {detail}. Fix the validation error..."
build_baseline_scientist_action(observation) -> ScientistAction β AGT 04
Deterministic non-LLM action for smoke tests. No API calls.
Decision tree:
- If protocol exists AND at max rounds β
accept - If protocol exists AND latest lab_manager feedback indicates blocker β
revise_protocol(halve sample, reduce duration) - If protocol exists AND no blocker β
accept - If no protocol β
propose_protocol(domain-inferred defaults)
Domain inference (_infer_domain):
- Checks paper fields for ML hints (benchmark, dataset, gpu, bert...) β
machine_learning - Checks for finance hints (backtest, sharpe, trading...) β
finance_trading - Default β
mathematics
Blocker detection (_feedback_indicates_blocker):
- Returns
Falseif action_type isacceptorreport_feasibility - Otherwise checks message for blocker hints: booked, unavailable, exceeds, tight, budget, cost, etc.
Classes
ScientistOutputParseError(ValueError)
| Attribute | Type | Purpose |
|---|---|---|
code |
Literal["no_json", "invalid_json", "invalid_action"] |
Machine-readable error type |
message |
str |
Human-readable detail |
raw_text |
str |
Original model output |
parsed_payload |
dict | None |
Decoded JSON if parsing succeeded |
RetryMetadata(BaseModel) β extra="forbid"
| Field | Type | Purpose |
|---|---|---|
attempt_count |
int |
Total attempts (1 = success on first try) |
retry_count |
int |
attempt_count - 1 |
last_error_code |
str | None |
Error code from last failure |
last_error_message |
str | None |
Error message from last failure |
ScientistCallResult(BaseModel) β extra="forbid"
| Field | Type |
|---|---|
action |
ScientistAction |
metadata |
RetryMetadata |
Type Aliases
GenerateFn = Callable[[list[dict[str, str]]], str]
Constants
_ML_HINTS = ("benchmark", "dataset", "accuracy", "tokenizer", "train", "gpu", ...)
_FINANCE_HINTS = ("backtest", "drawdown", "sharpe", "trading", "slippage", ...)
_BLOCKER_HINTS = ("booked", "unavailable", "exceeds", "tight", "budget", "cost", ...)
Lab Manager Policy β lab_manager_policy.py
Pipeline Flow
protocol + scenario β check_feasibility()
β
FeasibilityCheckResult (7 dimensions)
β
suggest_alternative(protocol, check, scenario)
β
AlternativeSuggestion | None
β
compose_lab_manager_response(check, suggestion)
β
LabManagerAction (typed, with explanation)
Public Functions
check_feasibility(protocol, scenario) -> FeasibilityCheckResult β AGT 05
Runs 7 deterministic dimension checks. No LLM calls.
Checks performed:
| Dimension | Function | What it checks |
|---|---|---|
protocol |
_build_protocol_check |
Wraps validate_protocol() from MOD 05 |
budget |
_check_budget |
_estimate_protocol_cost() vs budget_remaining |
equipment |
_check_equipment |
Items available/booked, finds substitutions |
reagents |
_check_reagents |
Items in-stock/out-of-stock, finds substitutions |
schedule |
_check_schedule |
duration_days vs time_limit_days |
staff |
_check_staff |
_estimate_staff_load() vs staff_count |
policy |
_check_policy |
Safety restrictions (e.g., offline-only execution) |
Cost estimation (_estimate_protocol_cost):
base = sample_size * 10
+ duration_days * 50
+ len(controls) * 25
+ len(required_equipment) * 100
+ len(required_reagents) * 75
Staff estimation (_estimate_staff_load):
base = 1
+ (1 if sample_size > 20)
+ (1 if len(controls) > 2)
+ (1 if duration_days > 5)
+ (1 if len(required_equipment) > 2)
suggest_alternative(protocol, check_result, scenario) -> AlternativeSuggestion | None β AGT 06
Deterministic revision engine. Returns None if already feasible.
Fix order (deterministic):
- Equipment substitutions β replace booked items with alternatives
- Reagent substitutions β replace out-of-stock items with alternatives
- Duration clamp β reduce to
time_limit_daysif over - Sample size reduction β iterative halving until budget fits (max 10 iterations)
Post-fix recheck: runs check_feasibility() on revised protocol.
Returns: revised protocol, list of changes, remaining failures, pre/post checks.
compose_lab_manager_response(check_result, suggestion=None, explanation_renderer=None) -> LabManagerAction β AGT 07
Converts grounded results into a typed LabManagerAction.
Action type selection (_select_lab_manager_action_type):
| Condition | Action |
|---|---|
| All 7 dimensions pass | ACCEPT |
| Suggestion exists AND improved AND only non-lab failures remain | SUGGEST_ALTERNATIVE |
| Lab constraints fail AND no suggestion | REJECT |
| Only policy/protocol fail (not lab constraints) | REPORT_FEASIBILITY |
| Suggestion exists but didn't improve | REJECT |
Lab constraints = budget, equipment, reagents, schedule, staff (not protocol, not policy).
Classes
DimensionCheck(BaseModel) β extra="forbid"
| Field | Type | Default |
|---|---|---|
ok |
bool |
True |
reasons |
list[str] |
[] |
FeasibilityCheckResult(BaseModel) β extra="forbid"
| Field | Type |
|---|---|
protocol |
DimensionCheck |
budget |
DimensionCheck |
equipment |
DimensionCheck |
reagents |
DimensionCheck |
schedule |
DimensionCheck |
staff |
DimensionCheck |
policy |
DimensionCheck |
estimated_cost |
float |
required_staff |
int |
substitution_options |
dict[str, list[str]] |
validation_result |
ValidationResult |
Computed properties: protocol_ok, budget_ok, equipment_ok, reagents_ok, schedule_ok, staff_ok, feasible, summary
SuggestionChange(BaseModel) β extra="forbid"
| Field | Type | Purpose |
|---|---|---|
field |
str |
Which protocol field was changed |
original |
str |
Original value (stringified) |
revised |
str |
New value (stringified) |
reason |
str |
Why it was changed |
tradeoff |
str |
What is lost |
AlternativeSuggestion(BaseModel) β extra="forbid"
| Field | Type |
|---|---|
revised_protocol |
Protocol |
applied_changes |
list[SuggestionChange] |
remaining_failures |
list[str] |
improved |
bool |
pre_check |
FeasibilityCheckResult |
post_check |
FeasibilityCheckResult |
Type Aliases
ExplanationRenderer = Callable[
[LabManagerActionType, FeasibilityCheckResult, Optional[AlternativeSuggestion]],
str,
]