Access CoPE-B-A4B-MM (commercial license required)
CoPE-B-A4B-MM is a proprietary multimodal model from Zentropi. Access requires a commercial license. Please provide your contact information and intended use case so we can route your request.
Log in or Sign Up to review the conditions and access this model content.
- CoPE-B-A4B-MM: The COntent Policy Evaluator Model (Multimodal Variant)
CoPE-B-A4B-MM: The COntent Policy Evaluator Model (Multimodal Variant)
Model Overview
CoPE-B-A4B-MM is the multimodal 2nd-generation Content Policy Evaluator model from Zentropi, built on Google's Gemma-4-26B-A4B-it Mixture-of-Experts architecture. It performs accurate content classification — for both text and images/video — based on developer-customizable policies.
This is the multimodal variant. It accepts content as text, images/video, or both, classified against your policy. For text-only deployments, see also zentropi-ai/cope-b-a4b — an open, text-only companion model.
Full methodology, training recipe, and evaluation details behind CoPE are described in our paper: "CoPE: A Small Language Model for Steerable and Scalable Content Labeling" (arXiv:2512.18027).
Key Features
- Native visual understanding: policy-conditioned image/video classification with the same prompt format as text
- Improved steerability and context length vs CoPE-A-9B
- Policy-adaptive content classification across text and visuals (no fixed taxonomy)
- High-accuracy, low latency binary labels
- Mixture-of-Experts efficiency: 25.2B total / 3.8B active parameters
- Frontier-level capability at consumer-GPU inference cost
Getting Started
You can use CoPE-B-A4B-MM (subject to the commercial license — see License & Subscription below) in three ways:
- Zentropi API — fastest path, with a generous free tier (no infra required)
- Self-hosted vLLM (H200 recommended) — for production-scale serving on your own infrastructure
- Direct inference in Python — load via Transformers (text + image inputs); see this Colab notebook for a working example
See the Running the Model section below for details on each.
Technical Specifications
Model Architecture
CoPE-B-A4B-MM is built on top of Google's Gemma-4-26B-A4B-it:
- 25.2 billion total parameters across 128 experts per layer
- 3.8 billion active parameters per forward pass (top-k=8 of 128 experts)
- 256K-token context window
- Native bidirectional vision tower
CoPE-B-A4B-MM was fine-tuned with LoRA then merged into the base so it can be run directly without adapters.
Input Format
The CoPE prompt can be paired with text content, image/video content, or both:
Examine the given POLICY and determine if the given CONTENT meets the criteria for ANY of the LABELS. Answer "1" if yes, and "0" if no.
POLICY
======
[policy text]
CONTENT
=======
[content text, or "[image]/[video]" if attaching an image]
For text-only inputs, pass the prompt as a single user-turn text message:
messages = [{"role": "user", "content": cope_prompt}]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
For image inputs, attach the image to the user-turn content:
messages = [{"role": "user", "content": [
{"type": "text", "text": cope_prompt},
{"type": "image", "image": pil_image}
]}]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
)
Important: Creating high-quality labeling criteria is the key to unlocking superior performance, so we've created the Zentropi system to enable rapid generation, testing, and tuning of policies that are optimized for CoPE interpretability. It is free for anyone to get started.
Output Format
CoPE-B-A4B-MM provides binary classification outputs as a single token:
0: None of the policy labels apply to the content1: One or more policy labels apply to the content
System Requirements
- Recommended: H200 (141GB VRAM) for production serving via vLLM. A100 80GB also supported with
--max-num-batched-tokens 8192override - Inference latency comparable to a 4B-parameter dense model at batch=1, due to MoE's sparse activation
- Compatible with vLLM ≥ 0.20.2 for production serving
Training Details
For the full training recipe (hyperparameters, contradictory-policy dataset construction, ablation studies), see our paper: "CoPE: A Small Language Model for Steerable and Scalable Content Labeling". A condensed methodology overview is also available in our research talk.
Training Methodology
CoPE-B-A4B-MM inherits and refines the policy-interpretation training methodology pioneered with CoPE-A-9B:
- Contradictory example training: identical content samples with systematically contradictory labels across policy variants, forcing the model to learn policy interpretation rather than pattern memorization
- Policy-shape diversity: training corpus spans permissive, moderate, and stringent policy variants per topical area
- Multimodal-aware fine-tuning: LoRA fine-tuning conducted with the full multimodal forward graph active (image-text-to-text path), which leverages image understanding
Training Data
- Policy texts authored by the CoPE team across multiple topic areas
- Content data sourced from publicly-accessible internet forums
- Labels produced via a 4-pass LLM-assisted relabeling pipeline
Data Integrity
The training corpus and the evaluation test sets are disjoint splits of Zentropi's internal dataset. The held-out test split shares zero content_text or policy_text samples with the training split. Test policies are novel policies so the evaluation measures policy-text generalization, not policy memorization.
Performance Evaluation
Methodology
CoPE-B-A4B-MM was evaluated on a held-out text-based test set of (content, policy) pairs with relabeled ground-truth labels, against a broad slate of frontier proprietary models, open-weight reasoning models, and fixed-taxonomy safety classifiers. All numbers below are on the relabeled test set. Tables are sorted by F1 descending; CoPE models are in bold.
Note: A standard benchmark for policy-steerable image-classification does not yet exist to our knowledge, but we are working on formalizing this within the community. The numbers below evaluate the text-classification path; image-classification capability is functional but not yet covered by a public benchmark.
Benchmark Results
Overview: Average Across Topics
Unweighted mean of the per-category Precision / Recall / F1 below. Every category area carries equal weight. Detailed performance per category follows afterwards.
| Model | Precision | Recall | F1 Score | Self-hostable | Single-pass* | Multimodal |
|---|---|---|---|---|---|---|
| CoPE-B-A4B-MM | 0.83 | 0.84 | 0.82 | ✓ | ✓ | ✓ |
| CoPE-B-A4B | 0.74 | 0.90 | 0.81 | ✓ | ✓ | |
| CoPE-A-9B | 0.74 | 0.88 | 0.80 | ✓ | ✓ | |
| GPT-5.4 (default reasoning) | 0.68 | 0.95 | 0.78 | ✓ | ||
| Gemini-3.5-Flash | 0.69 | 0.91 | 0.78 | ✓ | ✓ | |
| Gemma-4-26B-A4B-it | 0.67 | 0.90 | 0.76 | ✓ | ✓ | ✓ |
| Claude-Opus-4.6 | 0.65 | 0.95 | 0.75 | ✓ | ✓ | |
| Gemini-3.1-Flash-Lite | 0.69 | 0.86 | 0.75 | ✓ | ✓ | |
| gpt-oss-120b (default reasoning) | 0.68 | 0.88 | 0.75 | ✓ | ||
| gpt-oss-safeguard-20b (default reasoning) | 0.70 | 0.82 | 0.75 | ✓ | ||
| gpt-oss-120b (low reasoning) | 0.66 | 0.86 | 0.73 | ✓ | ||
| gpt-oss-20b (default reasoning) | 0.65 | 0.88 | 0.72 | ✓ | ||
| gpt-oss-20b (low reasoning) | 0.63 | 0.89 | 0.72 | ✓ | ||
| Claude-Sonnet-4.6 | 0.61 | 0.89 | 0.71 | ✓ | ✓ | |
| GPT-5-mini (default reasoning) | 0.56 | 0.97 | 0.69 | ✓ | ||
| Claude-Haiku-4.5 | 0.56 | 0.68 | 0.60 | ✓ | ✓ | |
| ShieldGemma-9B | 0.54 | 0.75 | 0.58 | ✓ | ✓ | |
| LlamaGuard4-12B | 0.50 | 0.66 | 0.52 | ✓ | ✓ | ✓ |
* Single-pass means the model produces its classification in one forward pass, with no internal reasoning chain — enabling lower latency and cost than reasoning-based models that may emit thousands of intermediate tokens per decision.
Drugs Classification
| Model | Precision | Recall | F1 Score |
|---|---|---|---|
| Claude-Opus-4.6 | 0.78 | 0.97 | 0.87 |
| CoPE-B-A4B-MM | 0.75 | 0.90 | 0.82 |
| Gemini-3.5-Flash | 0.70 | 1.0 | 0.82 |
| Gemma-4-26B-A4B-it | 0.69 | 0.97 | 0.81 |
| Claude-Sonnet-4.6 | 0.65 | 0.93 | 0.77 |
| CoPE-B-A4B | 0.66 | 0.90 | 0.76 |
| GPT-5.4 (default reasoning) | 0.61 | 1.0 | 0.76 |
| gpt-oss-safeguard-20b (default reasoning) | 0.68 | 0.83 | 0.75 |
| gpt-oss-120b (default reasoning) | 0.59 | 1.0 | 0.74 |
| Gemini-3.1-Flash-Lite | 0.57 | 1.0 | 0.72 |
| gpt-oss-20b (default reasoning) | 0.56 | 0.97 | 0.71 |
| gpt-oss-120b (low reasoning) | 0.53 | 1.0 | 0.69 |
| CoPE-A-9B | 0.57 | 0.83 | 0.68 |
| GPT-5-mini (default reasoning) | 0.49 | 1.0 | 0.66 |
| gpt-oss-20b (low reasoning) | 0.50 | 0.97 | 0.66 |
| ShieldGemma-9B | 0.42 | 1.0 | 0.59 |
| LlamaGuard4-12B | 0.39 | 0.90 | 0.55 |
| Claude-Haiku-4.5 | 0.68 | 0.43 | 0.53 |
Harassment Classification
| Model | Precision | Recall | F1 Score |
|---|---|---|---|
| Gemini-3.5-Flash | 0.63 | 0.91 | 0.75 |
| GPT-5.4 (default reasoning) | 0.60 | 0.95 | 0.74 |
| gpt-oss-120b (low reasoning) | 0.63 | 0.87 | 0.73 |
| CoPE-B-A4B | 0.57 | 0.96 | 0.72 |
| CoPE-B-A4B-MM | 0.58 | 0.93 | 0.72 |
| CoPE-A-9B | 0.60 | 0.88 | 0.71 |
| Gemini-3.1-Flash-Lite | 0.58 | 0.91 | 0.71 |
| gpt-oss-120b (default reasoning) | 0.58 | 0.85 | 0.69 |
| gpt-oss-20b (default reasoning) | 0.56 | 0.90 | 0.69 |
| gpt-oss-20b (low reasoning) | 0.56 | 0.89 | 0.69 |
| gpt-oss-safeguard-20b (default reasoning) | 0.59 | 0.79 | 0.68 |
| Gemma-4-26B-A4B-it | 0.49 | 0.93 | 0.65 |
| Claude-Opus-4.6 | 0.44 | 0.98 | 0.61 |
| GPT-5-mini (default reasoning) | 0.45 | 0.95 | 0.61 |
| Claude-Sonnet-4.6 | 0.39 | 0.94 | 0.56 |
| Claude-Haiku-4.5 | 0.44 | 0.60 | 0.51 |
| ShieldGemma-9B | 0.32 | 0.60 | 0.42 |
| LlamaGuard4-12B | 0.25 | 0.44 | 0.32 |
Hate Speech Classification
| Model | Precision | Recall | F1 Score |
|---|---|---|---|
| GPT-5.4 (default reasoning) | 0.88 | 0.93 | 0.91 |
| Gemini-3.1-Flash-Lite | 0.92 | 0.84 | 0.88 |
| Claude-Opus-4.6 | 0.78 | 0.98 | 0.87 |
| CoPE-B-A4B | 0.86 | 0.88 | 0.87 |
| CoPE-B-A4B-MM | 0.93 | 0.82 | 0.87 |
| Gemma-4-26B-A4B-it | 0.89 | 0.84 | 0.86 |
| gpt-oss-120b (default reasoning) | 0.80 | 0.94 | 0.86 |
| gpt-oss-safeguard-20b (default reasoning) | 0.81 | 0.87 | 0.84 |
| Gemini-3.5-Flash | 0.77 | 0.90 | 0.83 |
| gpt-oss-120b (low reasoning) | 0.74 | 0.94 | 0.83 |
| gpt-oss-20b (default reasoning) | 0.74 | 0.91 | 0.82 |
| CoPE-A-9B | 0.71 | 0.94 | 0.81 |
| GPT-5-mini (default reasoning) | 0.68 | 0.99 | 0.80 |
| gpt-oss-20b (low reasoning) | 0.67 | 0.93 | 0.78 |
| Claude-Sonnet-4.6 | 0.66 | 0.92 | 0.77 |
| ShieldGemma-9B | 0.56 | 0.98 | 0.71 |
| LlamaGuard4-12B | 0.56 | 0.87 | 0.68 |
| Claude-Haiku-4.5 | 0.54 | 0.73 | 0.62 |
Self-Harm Content Classification
| Model | Precision | Recall | F1 Score |
|---|---|---|---|
| CoPE-B-A4B-MM | 0.95 | 0.92 | 0.94 |
| GPT-5-mini (default reasoning) | 0.91 | 0.98 | 0.94 |
| CoPE-B-A4B | 0.91 | 0.94 | 0.93 |
| GPT-5.4 (default reasoning) | 0.95 | 0.90 | 0.93 |
| Claude-Sonnet-4.6 | 0.88 | 0.97 | 0.92 |
| Gemini-3.5-Flash | 0.86 | 0.98 | 0.92 |
| Claude-Opus-4.6 | 0.85 | 0.98 | 0.91 |
| CoPE-A-9B | 0.93 | 0.89 | 0.91 |
| gpt-oss-120b (default reasoning) | 0.95 | 0.87 | 0.91 |
| gpt-oss-20b (default reasoning) | 0.94 | 0.88 | 0.91 |
| Claude-Haiku-4.5 | 0.88 | 0.90 | 0.89 |
| Gemini-3.1-Flash-Lite | 0.83 | 0.96 | 0.89 |
| gpt-oss-safeguard-20b (default reasoning) | 0.93 | 0.86 | 0.89 |
| gpt-oss-120b (low reasoning) | 0.96 | 0.79 | 0.87 |
| gpt-oss-20b (low reasoning) | 0.95 | 0.80 | 0.87 |
| Gemma-4-26B-A4B-it | 0.73 | 0.97 | 0.84 |
| ShieldGemma-9B | 0.72 | 0.89 | 0.80 |
| LlamaGuard4-12B | 0.80 | 0.71 | 0.75 |
Sexual Content Classification
| Model | Precision | Recall | F1 Score |
|---|---|---|---|
| CoPE-A-9B | 0.98 | 0.93 | 0.95 |
| CoPE-B-A4B-MM | 0.86 | 0.98 | 0.92 |
| gpt-oss-120b (default reasoning) | 0.94 | 0.89 | 0.92 |
| Gemini-3.5-Flash | 0.88 | 0.93 | 0.90 |
| gpt-oss-safeguard-20b (default reasoning) | 0.91 | 0.89 | 0.90 |
| Claude-Opus-4.6 | 0.88 | 0.89 | 0.89 |
| gpt-oss-120b (low reasoning) | 0.88 | 0.91 | 0.89 |
| gpt-oss-20b (low reasoning) | 0.94 | 0.84 | 0.89 |
| GPT-5.4 (default reasoning) | 0.82 | 0.95 | 0.88 |
| gpt-oss-20b (default reasoning) | 0.94 | 0.82 | 0.88 |
| Gemma-4-26B-A4B-it | 0.81 | 0.93 | 0.87 |
| Gemini-3.1-Flash-Lite | 0.80 | 0.93 | 0.86 |
| Claude-Sonnet-4.6 | 0.90 | 0.79 | 0.84 |
| ShieldGemma-9B | 0.91 | 0.77 | 0.83 |
| Claude-Haiku-4.5 | 0.74 | 0.91 | 0.82 |
| GPT-5-mini (default reasoning) | 0.72 | 0.95 | 0.82 |
| CoPE-B-A4B | 0.69 | 0.98 | 0.81 |
| LlamaGuard4-12B | 0.83 | 0.36 | 0.50 |
Note: The precision-recall trade-off on sexual content classification for CoPE-B-A4B-MM differs slightly from CoPE-A-9B; both remain strong. Either model can be further tuned to your operating point by creating a policy that is well-matched to your golden dataset. Tools to do so are available at zentropi.ai.
Toxic Speech Classification
| Model | Precision | Recall | F1 Score |
|---|---|---|---|
| CoPE-B-A4B | 0.76 | 0.85 | 0.80 |
| CoPE-A-9B | 0.67 | 0.91 | 0.77 |
| CoPE-B-A4B-MM | 0.80 | 0.73 | 0.76 |
| Gemini-3.5-Flash | 0.56 | 0.94 | 0.70 |
| Claude-Sonnet-4.6 | 0.53 | 0.94 | 0.67 |
| Gemini-3.1-Flash-Lite | 0.51 | 1.0 | 0.67 |
| Gemma-4-26B-A4B-it | 0.48 | 1.0 | 0.65 |
| Claude-Opus-4.6 | 0.46 | 0.97 | 0.63 |
| gpt-oss-safeguard-20b (default reasoning) | 0.46 | 0.94 | 0.62 |
| ShieldGemma-9B | 0.43 | 0.97 | 0.60 |
| gpt-oss-20b (low reasoning) | 0.43 | 0.97 | 0.59 |
| GPT-5.4 (default reasoning) | 0.41 | 1.0 | 0.58 |
| gpt-oss-120b (default reasoning) | 0.41 | 1.0 | 0.58 |
| gpt-oss-120b (low reasoning) | 0.40 | 1.0 | 0.57 |
| gpt-oss-20b (default reasoning) | 0.40 | 0.97 | 0.57 |
| Claude-Haiku-4.5 | 0.43 | 0.73 | 0.54 |
| GPT-5-mini (default reasoning) | 0.32 | 1.0 | 0.49 |
| LlamaGuard4-12B | 0.37 | 0.52 | 0.43 |
For background on the unique nature of the toxicity policy we tested, see this blog post.
Violence Classification
| Model | Precision | Recall | F1 Score |
|---|---|---|---|
| CoPE-A-9B | 0.72 | 0.79 | 0.76 |
| CoPE-B-A4B | 0.70 | 0.79 | 0.75 |
| CoPE-B-A4B-MM | 0.96 | 0.56 | 0.71 |
| GPT-5.4 (default reasoning) | 0.52 | 0.90 | 0.66 |
| Gemma-4-26B-A4B-it | 0.57 | 0.69 | 0.63 |
| Gemini-3.5-Flash | 0.45 | 0.69 | 0.55 |
| gpt-oss-120b (default reasoning) | 0.48 | 0.62 | 0.54 |
| gpt-oss-safeguard-20b (default reasoning) | 0.51 | 0.56 | 0.54 |
| gpt-oss-20b (low reasoning) | 0.39 | 0.82 | 0.53 |
| Claude-Opus-4.6 | 0.36 | 0.87 | 0.51 |
| gpt-oss-120b (low reasoning) | 0.48 | 0.54 | 0.51 |
| gpt-oss-20b (default reasoning) | 0.40 | 0.69 | 0.51 |
| GPT-5-mini (default reasoning) | 0.33 | 0.95 | 0.49 |
| Gemini-3.1-Flash-Lite | 0.62 | 0.41 | 0.49 |
| Claude-Sonnet-4.6 | 0.29 | 0.77 | 0.42 |
| LlamaGuard4-12B | 0.27 | 0.79 | 0.40 |
| Claude-Haiku-4.5 | 0.24 | 0.44 | 0.31 |
| ShieldGemma-9B | 0.40 | 0.051 | 0.091 |
Performance Analysis
In short, CoPE-B-A4B-MM delivers policy-steerable classification accuracy that matches or exceeds frontier proprietary models — across both text and images/video, under a single policy — while being a fraction of their size, far faster and cheaper to run, and deployable locally for greater security.
Specifically, CoPE-B-A4B-MM delivers an unweighted-average F1 of 0.82 — ahead of GPT-5.4 at 0.78 (using default reasoning) and ahead of its predecessor CoPE-A-9B at 0.80. Its text-only sibling CoPE-B-A4B is similarly strong at 0.81.
The fixed-taxonomy safety classifiers (LlamaGuard4-12B, ShieldGemma-9B) trail by 0.18+ absolute on overall F1. This is consistent with their built-in-taxonomy design: when asked to evaluate against a user-supplied policy, they tend to over-fire or miss off-taxonomy criteria.
Beyond raw F1, CoPE-B-A4B-MM's primary upgrade over CoPE-A-9B is in policy steerability — the model's ability to follow custom policy stances on the same content rather than apply a fixed harm taxonomy. A dedicated steerability benchmark with full methodology and head-to-head model comparison will be published separately.
The multimodal variant extends this capability to image/video classification using the same policy-interpretation framework — a single CoPE policy applies equally to text content and image content presented under that policy.
Migrating from CoPE-A
If you're currently using CoPE-A-9B and moving to CoPE-B-A4B-MM, three things to flag for the text path (image classification is new in CoPE-B and has no CoPE-A equivalent to migrate from):
1. CoPE-B uses the Gemma-4 chat template
CoPE-B's prompt must be passed through apply_chat_template as a user-turn message — the answer comes back as the assistant-turn output. If your CoPE-A code path raw-concatenates the prompt directly, that pattern will not work with CoPE-B. See the Input Format section above or the runnable Colab notebook for the exact pattern.
Note also that the CoPE-B prompt is leaner than CoPE-A's: there is no INSTRUCTIONS header or ANSWER footer to include — the chat template's role markers replace them.
2. Recalibrate confidence thresholds
CoPE-B is on average more confident than CoPE-A — it concentrates more probability mass on its answer token. If you use the output token probability (or logprob) as a confidence signal for downstream routing or thresholding, your CoPE-A thresholds will not transfer directly. Recalibrate against a labeled sample of your own traffic before relying on the old thresholds.
3. Re-optimize policies for CoPE-B
Policies that were optimized for CoPE-A may not be optimal for CoPE-B. CoPE-B's improved policy interpretation can extract more nuanced criteria from a policy than CoPE-A could, which sometimes changes the optimal phrasing. We recommend running existing CoPE-A policies through the Zentropi platform, which has CoPE-B-aware policy authoring tools, to refresh them against a labeled golden dataset.
Intended Applications
Primary Use Cases
Content Labeling
- Real-time content moderation for both text and images/images
- Batch processing of multimedia content
- Policy-driven content classification at scale
LLM Guardrails
- Input prompt risk assessment (text + image/video inputs)
- Output answer risk assessment
- NB: Not yet optimized for agentic patterns
Content Scoring
- Feature generation for social feed ranking
- Language model training data filtering
- Image content review against custom policies
See also these case studies for how other organizations are using CoPE's powerful classification capabilities to advance their work.
Prohibited and Discouraged Uses
In addition to any restrictions in your commercial license, the following applications fall outside the intended scope of the model and may produce poor or unsafe results:
- Surveillance applications
- Use cases beyond the stated technical limitations (see below)
- Zero shot use without human review for high-stakes moderation decisions
Limitations and Constraints
Current Limitations
- Context Length: Limited to 256K tokens (combined policy and content) — a 32x increase over CoPE-A-9B's 8K limit
- Language Support: Currently optimized for US English only. Performance will degrade for other languages and locales.
- Knowledge Constraints: Cannot make classifications requiring external verification (e.g., misinformation) unless explicitly defined in the provided context
- Scope: Binary classification only (i.e., presence/absence of matching labels)
Ethical Considerations
Bias and Fairness
While comprehensive bias evaluation is still ongoing, users should:
- Implement careful policy design to mitigate potential biases
- Monitor classification patterns across different demographic groups
- Contribute problematic examples to our bias assessment efforts
Safety Measures
The model's binary classification nature inherently limits certain risks, but users should:
- Maintain appropriate human oversight
- Regularly audit classification decisions
- Implement robust observability systems
Running the Model
Sample Policies
CoPE-B-A4B-MM was evaluated against — and works well with — Zentropi's seven public reference policies covering the harm areas the model was trained on. These apply equally to text and image content under the same policy text, are ready to use, and serve as good starting points for custom policy authoring:
Important: The strength of the CoPE system is that it can interpret your rules and you are not stuck with anybody else's definitions, including ours. Therefore use the policies above as an example, but adapt the policy to your platform's specific needs. For custom policies, Zentropi provides a guided authoring workflow that optimizes policy structure for CoPE given your labeled 'golden' dataset.
via Hosted API
The easiest way to get started with this model is to use it through the Zentropi API, which has a very generous free tier. Just create an account and mint an API key.
via Direct Inference (Python)
To call the model directly via Transformers, see this runnable Colab notebook. It demonstrates loading CoPE-B with the Gemma-4 chat template and shows a complete worked example end-to-end. The multimodal variant uses the same calling convention, with images attached to the user-turn content (see the Input Format section above for image-attachment example).
via Self-Hosting (vLLM, commercial license required)
Self-hosting CoPE-B-A4B-MM requires a subscription to Zentropi (see License). With weights provisioned under a license, the model can be served under vLLM. H200 is recommended:
vllm serve zentropi-ai/cope-b-a4b-mm \
--dtype bfloat16 \
--max-model-len 256000
For A100 80GB deployment, add --max-num-batched-tokens 8192 to override vLLM's auto-sized scheduler cap (otherwise the multimodal-item token budget exceeds the default batched-tokens cap and serving will fail to start).
Maintenance and Updates
Update Schedule
- Annual releases planned
- Regular performance improvements
- Community-driven feature enhancements
Future Roadmap Focus
- Full benchmarking of image-classification capability
- Performance optimizations (quantized variants)
- Multilingual and locale support
Community and Support
For any technical questions or comments, please join our HuggingFace community forum or the Roost model community. You can share your feedback, suggest new areas, or pick our brains about anything. If you'd prefer a more private discussion, you can also email us at info@zentropi.ai.
About the Developer
CoPE-B-A4B-MM is developed and maintained by Zentropi, a public benefit company focused on making content classification simple and powerful. The project represents a collaborative effort between industry experts and researchers to advance the state of the art in content labeling technology.
License
CoPE-B-A4B-MM is available exclusively to Zentropi subscribers. Access and use of the model — including downloading weights, self-hosting, and using the hosted Zentropi API — are governed by the terms and conditions of the Zentropi Master Services Agreement (MSA) and the subscription tier associated with your account.
To subscribe or request access for evaluation, visit zentropi.ai or contact info@zentropi.ai.
If your use case does not require image understanding, the text-only companion model zentropi-ai/cope-b-a4b is released under the open-source Apache 2.0 license and may be used without a Zentropi subscription.
Citation
If you use CoPE in your research, please cite our paper:
@article{cope2025,
title = {CoPE: A Small Language Model for Steerable and Scalable Content Labeling},
author = {Chakrabarti, Willner, et al.},
journal = {arXiv preprint arXiv:2512.18027},
year = {2025},
url = {https://arxiv.org/abs/2512.18027}
}
Last Updated: May 27, 2026
- Downloads last month
- 85