You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MERIT-XS (Research Preview)

MERIT-XS is an early multilingual moderation encoder developed by Meridian Safety for research into compact, Unicode-aware safety classification systems.

This release packages a binary toxicity research preview built from:

  • MERIT-XS encoder pretraining
  • a top-2-layer moderation adaptation run
  • a binary moderation head

This artifact is not production-ready and should not be used as a standalone safety system.

Included files

  • merit_xs_preview.pt
    • exported moderation artifact with adapted encoder weights and binary head
  • infer_merit_xs.py
    • CLI inference entrypoint
  • load_merit_xs.py
    • simple Python loader for local use
  • metrics_summary.json
    • dev/test metrics and threshold sweep summary
  • profile_summary.json
    • lightweight export-run timing summary
  • merit/
    • local model package
  • assets/tokenizers/merit/
    • tokenizer files

Setup

pip install -r requirements.txt

License

This package uses the included LICENSE.txt:

  • MERIT Research Preview License (MRPL v1.0)
  • research, evaluation, and benchmarking use are allowed
  • commercial deployment and hosted/public API use require separate permission

CLI usage

python infer_merit_xs.py \
  --text "you are awful" \
  --text "thanks for your help"

You can also pass an explicit checkpoint path:

python infer_merit_xs.py \
  --checkpoint merit_xs_preview.pt \
  --text "you are a stupid idiot"

Python usage

from load_merit_xs import load_merit_xs

model = load_merit_xs()
results = model.predict(
    [
        "you are awful",
        "thanks for your help",
        "you are a stupid idiot",
    ]
)
print(results)

Output schema

Each prediction returns:

  • score
    • sigmoid(logit)
  • decision
    • allow | review | action
  • confidence
    • threshold-distance heuristic only
  • decision_band
    • same band label used for the decision

Important: confidence here is a preview-time decision-margin style heuristic, not calibrated probability confidence.

Current limitations

  • Binary toxicity preview only
  • Not a full moderation taxonomy
  • Weak coverage for some safety categories, including self-harm / threat-style language
  • Message-level only
  • Incomplete multilingual and adversarial evaluation

Research note

This package is intended for:

  • research
  • benchmarking
  • representation-transfer experiments
  • moderation evaluation

It is not intended for:

  • production moderation
  • safety-critical enforcement
  • fully automated policy decisions

Existing model card

This package is being prepared for the existing Hugging Face repo:

MeridianSafety/MERIT-XS-Preview

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support