replicalab / docs /map /validation.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84

Validation Map β€” replicalab/utils/validation.py

Deterministic protocol validation against scenario constraints. Pure functions β€” no LLM calls, no side effects.

Tasks implemented: MOD 05

Public API

validate_protocol(protocol: Protocol, scenario: NormalizedScenarioPack) -> ValidationResult

Main entry point. Never raises β€” always returns a ValidationResult.

Checks run (in order):

  1. _check_obvious_impossibilities β€” sample_size < 1, no controls, duration < 1
  2. _check_duration_vs_time_limit β€” protocol days vs lab time_limit_days
  3. _check_equipment_vocabulary β€” items vs available/booked/substitutable
  4. _check_reagent_vocabulary β€” items vs in-stock/out-of-stock/substitutable
  5. _check_required_element_coverage β€” protocol text vs hidden_reference_spec.required_elements

Result: valid=True only if zero ERROR-level issues.

Data Classes

IssueSeverity(str, Enum)

Value Meaning
error Hard failure β€” protocol cannot proceed
warning Advisory β€” protocol is suboptimal but possible

ValidationIssue(BaseModel) β€” extra="forbid"

Field Type Example
severity IssueSeverity ERROR
category str "equipment", "duration", "sample_size"
message str "Equipment 'X' is booked and has no substitution."

ValidationResult(BaseModel) β€” extra="forbid"

Field Type
valid bool
issues list[ValidationIssue]

Properties:

  • errors β†’ list[ValidationIssue] (severity=ERROR only)
  • warnings β†’ list[ValidationIssue] (severity=WARNING only)

Check Details

_check_obvious_impossibilities

Condition Severity Category
sample_size < 1 ERROR sample_size
controls empty WARNING controls
duration_days < 1 ERROR duration

_check_duration_vs_time_limit

Condition Severity Category
duration_days > time_limit_days ERROR duration

_check_equipment_vocabulary

Condition Severity Category
Item available β€” (pass) β€”
Item booked + has substitution WARNING equipment
Item booked + no substitution ERROR equipment
Item unknown (not in inventory) WARNING equipment

_check_reagent_vocabulary

Condition Severity Category
Item in stock β€” (pass) β€”
Item out of stock + has substitution WARNING reagent
Item out of stock + no substitution ERROR reagent
Item unknown (not in inventory) WARNING reagent

_check_required_element_coverage

Checks each hidden_reference_spec.required_elements against protocol text fields using token matching.

Protocol text searched: technique, rationale, controls, equipment, reagents (joined, lowercased). Token extraction: _element_tokens(element) splits on spaces, keeps tokens with 3+ chars. Match: any token from element found in protocol text β†’ covered.

Condition Severity Category
Element not addressed WARNING required_element

Internal Helpers

Function Purpose
_normalize(label) Lowercase, strip, collapse whitespace
_element_tokens(element) Split element string into searchable tokens (3+ chars)
_substitution_alternatives(scenario) Set of normalized original items from allowed_substitutions

Who Consumes This

  • lab_manager_policy.py β€” check_feasibility() calls validate_protocol() and wraps result in protocol DimensionCheck
  • scoring/ (future) β€” JDG 01 rigor score will reuse _element_tokens for required element matching