You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

MERIT-XS (Research Preview)

MERIT-XS is an early multilingual moderation encoder developed by Meridian Safety for research into compact, Unicode-aware safety classification systems.

This release packages a binary toxicity research preview built from:

MERIT-XS encoder pretraining
a top-2-layer moderation adaptation run
a binary moderation head

This artifact is not production-ready and should not be used as a standalone safety system.

Included files

merit_xs_preview.pt
- exported moderation artifact with adapted encoder weights and binary head
infer_merit_xs.py
- CLI inference entrypoint
load_merit_xs.py
- simple Python loader for local use
metrics_summary.json
- dev/test metrics and threshold sweep summary
profile_summary.json
- lightweight export-run timing summary
merit/
- local model package
assets/tokenizers/merit/
- tokenizer files

Setup

pip install -r requirements.txt

License

This package uses the included LICENSE.txt:

MERIT Research Preview License (MRPL v1.0)
research, evaluation, and benchmarking use are allowed
commercial deployment and hosted/public API use require separate permission

CLI usage

python infer_merit_xs.py \
  --text "you are awful" \
  --text "thanks for your help"

You can also pass an explicit checkpoint path:

python infer_merit_xs.py \
  --checkpoint merit_xs_preview.pt \
  --text "you are a stupid idiot"

Python usage

from load_merit_xs import load_merit_xs

model = load_merit_xs()
results = model.predict(
    [
        "you are awful",
        "thanks for your help",
        "you are a stupid idiot",
    ]
)
print(results)

Output schema

Each prediction returns:

score
- sigmoid(logit)
decision
- allow | review | action
confidence
- threshold-distance heuristic only
decision_band
- same band label used for the decision

Important: confidence here is a preview-time decision-margin style heuristic, not calibrated probability confidence.

Current limitations

Binary toxicity preview only
Not a full moderation taxonomy
Weak coverage for some safety categories, including self-harm / threat-style language
Message-level only
Incomplete multilingual and adversarial evaluation

Research note

This package is intended for:

research
benchmarking
representation-transfer experiments
moderation evaluation

It is not intended for:

production moderation
safety-critical enforcement
fully automated policy decisions

Existing model card

This package is being prepared for the existing Hugging Face repo:

MeridianSafety/MERIT-XS-Preview

Downloads last month: -; Downloads are not tracked for this model. How to track