Access CoPE-B-A4B-MM (commercial license required)

CoPE-B-A4B-MM is a proprietary multimodal model from Zentropi. Access requires a commercial license. Please provide your contact information and intended use case so we can route your request.

CoPE-B-A4B-MM: The COntent Policy Evaluator Model (Multimodal Variant)

Model Overview

CoPE-B-A4B-MM is the multimodal 2nd-generation Content Policy Evaluator model from Zentropi, built on Google's Gemma-4-26B-A4B-it Mixture-of-Experts architecture. It performs accurate content classification — for both text and images/video — based on developer-customizable policies.

This is the multimodal variant. It accepts content as text, images/video, or both, classified against your policy. For text-only deployments, see also zentropi-ai/cope-b-a4b — an open, text-only companion model.

Full methodology, training recipe, and evaluation details behind CoPE are described in our paper: "CoPE: A Small Language Model for Steerable and Scalable Content Labeling" (arXiv:2512.18027).

Key Features

Native visual understanding: policy-conditioned image/video classification with the same prompt format as text
Improved steerability and context length vs CoPE-A-9B
Policy-adaptive content classification across text and visuals (no fixed taxonomy)
High-accuracy, low latency binary labels
Mixture-of-Experts efficiency: 25.2B total / 3.8B active parameters
Frontier-level capability at consumer-GPU inference cost

Getting Started

You can use CoPE-B-A4B-MM (subject to the commercial license — see License & Subscription below) in three ways:

Zentropi API — fastest path, with a generous free tier (no infra required)
Self-hosted vLLM (H200 recommended) — for production-scale serving on your own infrastructure
Direct inference in Python — load via Transformers (text + image inputs); see this Colab notebook for a working example

See the Running the Model section below for details on each.

Technical Specifications

Model Architecture

CoPE-B-A4B-MM is built on top of Google's Gemma-4-26B-A4B-it:

25.2 billion total parameters across 128 experts per layer
3.8 billion active parameters per forward pass (top-k=8 of 128 experts)
256K-token context window
Native bidirectional vision tower

CoPE-B-A4B-MM was fine-tuned with LoRA then merged into the base so it can be run directly without adapters.

Input Format

The CoPE prompt can be paired with text content, image/video content, or both:

Examine the given POLICY and determine if the given CONTENT meets the criteria for ANY of the LABELS. Answer "1" if yes, and "0" if no.


POLICY
======

[policy text]


CONTENT
=======

[content text, or "[image]/[video]" if attaching an image]

For text-only inputs, pass the prompt as a single user-turn text message:

messages = [{"role": "user", "content": cope_prompt}]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")

For image inputs, attach the image to the user-turn content:

messages = [{"role": "user", "content": [
    {"type": "text",  "text":  cope_prompt},
    {"type": "image", "image": pil_image}
]}]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
)

Important: Creating high-quality labeling criteria is the key to unlocking superior performance, so we've created the Zentropi system to enable rapid generation, testing, and tuning of policies that are optimized for CoPE interpretability. It is free for anyone to get started.

Output Format

CoPE-B-A4B-MM provides binary classification outputs as a single token:

0: None of the policy labels apply to the content
1: One or more policy labels apply to the content

System Requirements

Recommended: H200 (141GB VRAM) for production serving via vLLM. A100 80GB also supported with --max-num-batched-tokens 8192 override
Inference latency comparable to a 4B-parameter dense model at batch=1, due to MoE's sparse activation
Compatible with vLLM ≥ 0.20.2 for production serving

Training Details

For the full training recipe (hyperparameters, contradictory-policy dataset construction, ablation studies), see our paper: "CoPE: A Small Language Model for Steerable and Scalable Content Labeling". A condensed methodology overview is also available in our research talk.

Training Methodology

CoPE-B-A4B-MM inherits and refines the policy-interpretation training methodology pioneered with CoPE-A-9B:

Contradictory example training: identical content samples with systematically contradictory labels across policy variants, forcing the model to learn policy interpretation rather than pattern memorization
Policy-shape diversity: training corpus spans permissive, moderate, and stringent policy variants per topical area
Multimodal-aware fine-tuning: LoRA fine-tuning conducted with the full multimodal forward graph active (image-text-to-text path), which leverages image understanding

Training Data

Policy texts authored by the CoPE team across multiple topic areas
Content data sourced from publicly-accessible internet forums
Labels produced via a 4-pass LLM-assisted relabeling pipeline

Data Integrity

The training corpus and the evaluation test sets are disjoint splits of Zentropi's internal dataset. The held-out test split shares zero content_text or policy_text samples with the training split. Test policies are novel policies so the evaluation measures policy-text generalization, not policy memorization.

Performance Evaluation

Methodology

CoPE-B-A4B-MM was evaluated on a held-out text-based test set of (content, policy) pairs with relabeled ground-truth labels, against a broad slate of frontier proprietary models, open-weight reasoning models, and fixed-taxonomy safety classifiers. All numbers below are on the relabeled test set. Tables are sorted by F1 descending; CoPE models are in bold.

Note: A standard benchmark for policy-steerable image-classification does not yet exist to our knowledge, but we are working on formalizing this within the community. The numbers below evaluate the text-classification path; image-classification capability is functional but not yet covered by a public benchmark.

Benchmark Results

Overview: Average Across Topics

Unweighted mean of the per-category Precision / Recall / F1 below. Every category area carries equal weight. Detailed performance per category follows afterwards.

Model	Precision	Recall	F1 Score	Self-hostable	Single-pass*	Multimodal
CoPE-B-A4B-MM	0.83	0.84	0.82	✓	✓	✓
CoPE-B-A4B	0.74	0.90	0.81	✓	✓
CoPE-A-9B	0.74	0.88	0.80	✓	✓
GPT-5.4 (default reasoning)	0.68	0.95	0.78			✓
Gemini-3.5-Flash	0.69	0.91	0.78		✓	✓
Gemma-4-26B-A4B-it	0.67	0.90	0.76	✓	✓	✓
Claude-Opus-4.6	0.65	0.95	0.75		✓	✓
Gemini-3.1-Flash-Lite	0.69	0.86	0.75		✓	✓
gpt-oss-120b (default reasoning)	0.68	0.88	0.75	✓
gpt-oss-safeguard-20b (default reasoning)	0.70	0.82	0.75	✓
gpt-oss-120b (low reasoning)	0.66	0.86	0.73	✓
gpt-oss-20b (default reasoning)	0.65	0.88	0.72	✓
gpt-oss-20b (low reasoning)	0.63	0.89	0.72	✓
Claude-Sonnet-4.6	0.61	0.89	0.71		✓	✓
GPT-5-mini (default reasoning)	0.56	0.97	0.69			✓
Claude-Haiku-4.5	0.56	0.68	0.60		✓	✓
ShieldGemma-9B	0.54	0.75	0.58	✓	✓
LlamaGuard4-12B	0.50	0.66	0.52	✓	✓	✓

* Single-pass means the model produces its classification in one forward pass, with no internal reasoning chain — enabling lower latency and cost than reasoning-based models that may emit thousands of intermediate tokens per decision.

Drugs Classification

Model	Precision	Recall	F1 Score
Claude-Opus-4.6	0.78	0.97	0.87
CoPE-B-A4B-MM	0.75	0.90	0.82
Gemini-3.5-Flash	0.70	1.0	0.82
Gemma-4-26B-A4B-it	0.69	0.97	0.81
Claude-Sonnet-4.6	0.65	0.93	0.77
CoPE-B-A4B	0.66	0.90	0.76
GPT-5.4 (default reasoning)	0.61	1.0	0.76
gpt-oss-safeguard-20b (default reasoning)	0.68	0.83	0.75
gpt-oss-120b (default reasoning)	0.59	1.0	0.74
Gemini-3.1-Flash-Lite	0.57	1.0	0.72
gpt-oss-20b (default reasoning)	0.56	0.97	0.71
gpt-oss-120b (low reasoning)	0.53	1.0	0.69
CoPE-A-9B	0.57	0.83	0.68
GPT-5-mini (default reasoning)	0.49	1.0	0.66
gpt-oss-20b (low reasoning)	0.50	0.97	0.66
ShieldGemma-9B	0.42	1.0	0.59
LlamaGuard4-12B	0.39	0.90	0.55
Claude-Haiku-4.5	0.68	0.43	0.53

Harassment Classification

Model	Precision	Recall	F1 Score
Gemini-3.5-Flash	0.63	0.91	0.75
GPT-5.4 (default reasoning)	0.60	0.95	0.74
gpt-oss-120b (low reasoning)	0.63	0.87	0.73
CoPE-B-A4B	0.57	0.96	0.72
CoPE-B-A4B-MM	0.58	0.93	0.72
CoPE-A-9B	0.60	0.88	0.71
Gemini-3.1-Flash-Lite	0.58	0.91	0.71
gpt-oss-120b (default reasoning)	0.58	0.85	0.69
gpt-oss-20b (default reasoning)	0.56	0.90	0.69
gpt-oss-20b (low reasoning)	0.56	0.89	0.69
gpt-oss-safeguard-20b (default reasoning)	0.59	0.79	0.68
Gemma-4-26B-A4B-it	0.49	0.93	0.65
Claude-Opus-4.6	0.44	0.98	0.61
GPT-5-mini (default reasoning)	0.45	0.95	0.61
Claude-Sonnet-4.6	0.39	0.94	0.56
Claude-Haiku-4.5	0.44	0.60	0.51
ShieldGemma-9B	0.32	0.60	0.42
LlamaGuard4-12B	0.25	0.44	0.32

Hate Speech Classification

Model	Precision	Recall	F1 Score
GPT-5.4 (default reasoning)	0.88	0.93	0.91
Gemini-3.1-Flash-Lite	0.92	0.84	0.88
Claude-Opus-4.6	0.78	0.98	0.87
CoPE-B-A4B	0.86	0.88	0.87
CoPE-B-A4B-MM	0.93	0.82	0.87
Gemma-4-26B-A4B-it	0.89	0.84	0.86
gpt-oss-120b (default reasoning)	0.80	0.94	0.86
gpt-oss-safeguard-20b (default reasoning)	0.81	0.87	0.84
Gemini-3.5-Flash	0.77	0.90	0.83
gpt-oss-120b (low reasoning)	0.74	0.94	0.83
gpt-oss-20b (default reasoning)	0.74	0.91	0.82
CoPE-A-9B	0.71	0.94	0.81
GPT-5-mini (default reasoning)	0.68	0.99	0.80
gpt-oss-20b (low reasoning)	0.67	0.93	0.78
Claude-Sonnet-4.6	0.66	0.92	0.77
ShieldGemma-9B	0.56	0.98	0.71
LlamaGuard4-12B	0.56	0.87	0.68
Claude-Haiku-4.5	0.54	0.73	0.62

Self-Harm Content Classification

Model	Precision	Recall	F1 Score
CoPE-B-A4B-MM	0.95	0.92	0.94
GPT-5-mini (default reasoning)	0.91	0.98	0.94
CoPE-B-A4B	0.91	0.94	0.93
GPT-5.4 (default reasoning)	0.95	0.90	0.93
Claude-Sonnet-4.6	0.88	0.97	0.92
Gemini-3.5-Flash	0.86	0.98	0.92
Claude-Opus-4.6	0.85	0.98	0.91
CoPE-A-9B	0.93	0.89	0.91
gpt-oss-120b (default reasoning)	0.95	0.87	0.91
gpt-oss-20b (default reasoning)	0.94	0.88	0.91
Claude-Haiku-4.5	0.88	0.90	0.89
Gemini-3.1-Flash-Lite	0.83	0.96	0.89
gpt-oss-safeguard-20b (default reasoning)	0.93	0.86	0.89
gpt-oss-120b (low reasoning)	0.96	0.79	0.87
gpt-oss-20b (low reasoning)	0.95	0.80	0.87
Gemma-4-26B-A4B-it	0.73	0.97	0.84
ShieldGemma-9B	0.72	0.89	0.80
LlamaGuard4-12B	0.80	0.71	0.75

Sexual Content Classification

Model	Precision	Recall	F1 Score
CoPE-A-9B	0.98	0.93	0.95
CoPE-B-A4B-MM	0.86	0.98	0.92
gpt-oss-120b (default reasoning)	0.94	0.89	0.92
Gemini-3.5-Flash	0.88	0.93	0.90
gpt-oss-safeguard-20b (default reasoning)	0.91	0.89	0.90
Claude-Opus-4.6	0.88	0.89	0.89
gpt-oss-120b (low reasoning)	0.88	0.91	0.89
gpt-oss-20b (low reasoning)	0.94	0.84	0.89
GPT-5.4 (default reasoning)	0.82	0.95	0.88
gpt-oss-20b (default reasoning)	0.94	0.82	0.88
Gemma-4-26B-A4B-it	0.81	0.93	0.87
Gemini-3.1-Flash-Lite	0.80	0.93	0.86
Claude-Sonnet-4.6	0.90	0.79	0.84
ShieldGemma-9B	0.91	0.77	0.83
Claude-Haiku-4.5	0.74	0.91	0.82
GPT-5-mini (default reasoning)	0.72	0.95	0.82
CoPE-B-A4B	0.69	0.98	0.81
LlamaGuard4-12B	0.83	0.36	0.50

Note: The precision-recall trade-off on sexual content classification for CoPE-B-A4B-MM differs slightly from CoPE-A-9B; both remain strong. Either model can be further tuned to your operating point by creating a policy that is well-matched to your golden dataset. Tools to do so are available at zentropi.ai.

Toxic Speech Classification

Model	Precision	Recall	F1 Score
CoPE-B-A4B	0.76	0.85	0.80
CoPE-A-9B	0.67	0.91	0.77
CoPE-B-A4B-MM	0.80	0.73	0.76
Gemini-3.5-Flash	0.56	0.94	0.70
Claude-Sonnet-4.6	0.53	0.94	0.67
Gemini-3.1-Flash-Lite	0.51	1.0	0.67
Gemma-4-26B-A4B-it	0.48	1.0	0.65
Claude-Opus-4.6	0.46	0.97	0.63
gpt-oss-safeguard-20b (default reasoning)	0.46	0.94	0.62
ShieldGemma-9B	0.43	0.97	0.60
gpt-oss-20b (low reasoning)	0.43	0.97	0.59
GPT-5.4 (default reasoning)	0.41	1.0	0.58
gpt-oss-120b (default reasoning)	0.41	1.0	0.58
gpt-oss-120b (low reasoning)	0.40	1.0	0.57
gpt-oss-20b (default reasoning)	0.40	0.97	0.57
Claude-Haiku-4.5	0.43	0.73	0.54
GPT-5-mini (default reasoning)	0.32	1.0	0.49
LlamaGuard4-12B	0.37	0.52	0.43

For background on the unique nature of the toxicity policy we tested, see this blog post.

Violence Classification

Model	Precision	Recall	F1 Score
CoPE-A-9B	0.72	0.79	0.76
CoPE-B-A4B	0.70	0.79	0.75
CoPE-B-A4B-MM	0.96	0.56	0.71
GPT-5.4 (default reasoning)	0.52	0.90	0.66
Gemma-4-26B-A4B-it	0.57	0.69	0.63
Gemini-3.5-Flash	0.45	0.69	0.55
gpt-oss-120b (default reasoning)	0.48	0.62	0.54
gpt-oss-safeguard-20b (default reasoning)	0.51	0.56	0.54
gpt-oss-20b (low reasoning)	0.39	0.82	0.53
Claude-Opus-4.6	0.36	0.87	0.51
gpt-oss-120b (low reasoning)	0.48	0.54	0.51
gpt-oss-20b (default reasoning)	0.40	0.69	0.51
GPT-5-mini (default reasoning)	0.33	0.95	0.49
Gemini-3.1-Flash-Lite	0.62	0.41	0.49
Claude-Sonnet-4.6	0.29	0.77	0.42
LlamaGuard4-12B	0.27	0.79	0.40
Claude-Haiku-4.5	0.24	0.44	0.31
ShieldGemma-9B	0.40	0.051	0.091

Performance Analysis

In short, CoPE-B-A4B-MM delivers policy-steerable classification accuracy that matches or exceeds frontier proprietary models — across both text and images/video, under a single policy — while being a fraction of their size, far faster and cheaper to run, and deployable locally for greater security.

Specifically, CoPE-B-A4B-MM delivers an unweighted-average F1 of 0.82 — ahead of GPT-5.4 at 0.78 (using default reasoning) and ahead of its predecessor CoPE-A-9B at 0.80. Its text-only sibling CoPE-B-A4B is similarly strong at 0.81.

The fixed-taxonomy safety classifiers (LlamaGuard4-12B, ShieldGemma-9B) trail by 0.18+ absolute on overall F1. This is consistent with their built-in-taxonomy design: when asked to evaluate against a user-supplied policy, they tend to over-fire or miss off-taxonomy criteria.

Beyond raw F1, CoPE-B-A4B-MM's primary upgrade over CoPE-A-9B is in policy steerability — the model's ability to follow custom policy stances on the same content rather than apply a fixed harm taxonomy. A dedicated steerability benchmark with full methodology and head-to-head model comparison will be published separately.

The multimodal variant extends this capability to image/video classification using the same policy-interpretation framework — a single CoPE policy applies equally to text content and image content presented under that policy.

Migrating from CoPE-A

If you're currently using CoPE-A-9B and moving to CoPE-B-A4B-MM, three things to flag for the text path (image classification is new in CoPE-B and has no CoPE-A equivalent to migrate from):

1. CoPE-B uses the Gemma-4 chat template

CoPE-B's prompt must be passed through apply_chat_template as a user-turn message — the answer comes back as the assistant-turn output. If your CoPE-A code path raw-concatenates the prompt directly, that pattern will not work with CoPE-B. See the Input Format section above or the runnable Colab notebook for the exact pattern.

Note also that the CoPE-B prompt is leaner than CoPE-A's: there is no INSTRUCTIONS header or ANSWER footer to include — the chat template's role markers replace them.

2. Recalibrate confidence thresholds

CoPE-B is on average more confident than CoPE-A — it concentrates more probability mass on its answer token. If you use the output token probability (or logprob) as a confidence signal for downstream routing or thresholding, your CoPE-A thresholds will not transfer directly. Recalibrate against a labeled sample of your own traffic before relying on the old thresholds.

3. Re-optimize policies for CoPE-B

Policies that were optimized for CoPE-A may not be optimal for CoPE-B. CoPE-B's improved policy interpretation can extract more nuanced criteria from a policy than CoPE-A could, which sometimes changes the optimal phrasing. We recommend running existing CoPE-A policies through the Zentropi platform, which has CoPE-B-aware policy authoring tools, to refresh them against a labeled golden dataset.

Intended Applications

Primary Use Cases

Content Labeling
- Real-time content moderation for both text and images/images
- Batch processing of multimedia content
- Policy-driven content classification at scale
LLM Guardrails
- Input prompt risk assessment (text + image/video inputs)
- Output answer risk assessment
- NB: Not yet optimized for agentic patterns
Content Scoring
- Feature generation for social feed ranking
- Language model training data filtering
- Image content review against custom policies

See also these case studies for how other organizations are using CoPE's powerful classification capabilities to advance their work.

Prohibited and Discouraged Uses

In addition to any restrictions in your commercial license, the following applications fall outside the intended scope of the model and may produce poor or unsafe results:

Surveillance applications
Use cases beyond the stated technical limitations (see below)
Zero shot use without human review for high-stakes moderation decisions

Limitations and Constraints

Current Limitations

Context Length: Limited to 256K tokens (combined policy and content) — a 32x increase over CoPE-A-9B's 8K limit
Language Support: Currently optimized for US English only. Performance will degrade for other languages and locales.
Knowledge Constraints: Cannot make classifications requiring external verification (e.g., misinformation) unless explicitly defined in the provided context
Scope: Binary classification only (i.e., presence/absence of matching labels)

Ethical Considerations

Bias and Fairness

While comprehensive bias evaluation is still ongoing, users should:

Implement careful policy design to mitigate potential biases
Monitor classification patterns across different demographic groups
Contribute problematic examples to our bias assessment efforts

Safety Measures

The model's binary classification nature inherently limits certain risks, but users should:

Maintain appropriate human oversight
Regularly audit classification decisions
Implement robust observability systems

Running the Model

Sample Policies

CoPE-B-A4B-MM was evaluated against — and works well with — Zentropi's seven public reference policies covering the harm areas the model was trained on. These apply equally to text and image content under the same policy text, are ready to use, and serve as good starting points for custom policy authoring:

Important: The strength of the CoPE system is that it can interpret your rules and you are not stuck with anybody else's definitions, including ours. Therefore use the policies above as an example, but adapt the policy to your platform's specific needs. For custom policies, Zentropi provides a guided authoring workflow that optimizes policy structure for CoPE given your labeled 'golden' dataset.

via Hosted API

The easiest way to get started with this model is to use it through the Zentropi API, which has a very generous free tier. Just create an account and mint an API key.

via Direct Inference (Python)

To call the model directly via Transformers, see this runnable Colab notebook. It demonstrates loading CoPE-B with the Gemma-4 chat template and shows a complete worked example end-to-end. The multimodal variant uses the same calling convention, with images attached to the user-turn content (see the Input Format section above for image-attachment example).

via Self-Hosting (vLLM, commercial license required)

Self-hosting CoPE-B-A4B-MM requires a subscription to Zentropi (see License). With weights provisioned under a license, the model can be served under vLLM. H200 is recommended:

vllm serve zentropi-ai/cope-b-a4b-mm \
  --dtype bfloat16 \
  --max-model-len 256000

For A100 80GB deployment, add --max-num-batched-tokens 8192 to override vLLM's auto-sized scheduler cap (otherwise the multimodal-item token budget exceeds the default batched-tokens cap and serving will fail to start).

Maintenance and Updates

Update Schedule

Annual releases planned
Regular performance improvements
Community-driven feature enhancements

Future Roadmap Focus

Full benchmarking of image-classification capability
Performance optimizations (quantized variants)
Multilingual and locale support

Community and Support

For any technical questions or comments, please join our HuggingFace community forum or the Roost model community. You can share your feedback, suggest new areas, or pick our brains about anything. If you'd prefer a more private discussion, you can also email us at info@zentropi.ai.

About the Developer

CoPE-B-A4B-MM is developed and maintained by Zentropi, a public benefit company focused on making content classification simple and powerful. The project represents a collaborative effort between industry experts and researchers to advance the state of the art in content labeling technology.

License

CoPE-B-A4B-MM is available exclusively to Zentropi subscribers. Access and use of the model — including downloading weights, self-hosting, and using the hosted Zentropi API — are governed by the terms and conditions of the Zentropi Master Services Agreement (MSA) and the subscription tier associated with your account.

To subscribe or request access for evaluation, visit zentropi.ai or contact info@zentropi.ai.

If your use case does not require image understanding, the text-only companion model zentropi-ai/cope-b-a4b is released under the open-source Apache 2.0 license and may be used without a Zentropi subscription.

Citation

If you use CoPE in your research, please cite our paper:

@article{cope2025,
  title   = {CoPE: A Small Language Model for Steerable and Scalable Content Labeling},
  author  = {Chakrabarti, Willner, et al.},
  journal = {arXiv preprint arXiv:2512.18027},
  year    = {2025},
  url     = {https://arxiv.org/abs/2512.18027}
}

Last Updated: May 27, 2026