CoRe: Constructive Communication Rewriting

CoRe is a Qwen3-4B-Instruct-2507 based model fine-tuned for constructive communication rewriting.

Given a multi-user dialogue where the final utterance contains harmful language, CoRe rewrites the final turn into safer and more constructive communication while preserving the user's communicative intent and speaker style.

Existing safety refusal and detoxification rewriting methods either block harmful responses or inadequately preserve user intent and speaker style. CoRe is designed for constructive communication rewriting, which aims to rewrite harmful user language while preserving communicative intent and speaker style.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "moxin-li/CoRE"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True,
    use_fast=False,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

messages = [
    {
        "role": "system",
        "content": "You are a Senior Conflict Resolution Agent.\nYour role is to detoxify harmful communication while strictly preserving the original intent, stance, and rhetorical style.\nYou will rewrite ONLY the FINAL turn of the provided dialogue."
    },
    {
        "role": "user",
        "content": "You are given a multi-turn dialogue. The FINAL turn contains harmful or unsafe communication.\n\nRewrite ONLY the FINAL turn into civil, constructive, and safe language while preserving:\n- The original intent of the FINAL turn(core facts, corrections, refusals, or boundary setting)\n- The same argument, position, or stance\n- The same level of assertiveness and rhetorical force\n- The same speaker style and dialogue-appropriate register\n\nCore Rewriting Framework: NVC-Inspired Principles\n\nEnsure your rewrite inherently reflects the OFNR structure of Nonviolent Communication (NVC) by adhering to these constraints:\n\n- Observation (O):\n  Extract the objective, task-relevant content (facts, claims, corrections, refusals, boundaries),\n  and separate it from evaluations, sarcasm, and personal judgments.\n\n- Feelings (F):\n  Identify the speaker’s expressed or clearly implied reaction only if grounded in the text\n  (e.g., concern, confusion, frustration). Do not speculate.\n\n- Needs (N):\n  Identify the underlying value or need reflected in the turn\n  (e.g., accuracy, clarity, safety, respect, scope, rigor).\n\n- Request (R):\n  Identify what the speaker is asking the other person to do, stop doing, or acknowledge,\n  if such a request or boundary is present.\n\n- Style & Persona (S): \n  Identify UserA's conversational habits from previous turns (e.g., sentence length, punctuation quirks, typical transition words, structural patterns, and non-toxic catchphrases).\n\nUse this to guide rewriting:\nPreserve the objective content (Observation), the professional stance (Feelings/Needs), and the communicative force (Request) while removing personal attacks.\n\nCRITICAL RULES FOR DETOXIFICATION:\n1. Redefine the intent of this turn: The intent MUST be grounded in the subject matter itself (e.g., facts, code quality, policy, boundaries, refusals, corrections). It must NEVER be framed as judging, diagnosing, or correcting the interlocutor as a person.\n2. Remove Personal Targeting: Eliminate insults, sarcasm, ridicule, and any statement aimed at the interlocutor's intelligence, competence, motives, or behavior.\n3. Remove Meta-Commentary: Do not comment on the quality of the other person's reasoning, tone, process, or discussion style. State the factual correction, refusal, or boundary directly.\n4. Reject Polite Condescension: Do not replace insults with passive-aggressive, high-EQ, or academic belittling.\n5. Safe != Friendly: Preserve conflict when needed. If the original text is a firm refusal or a sharp disagreement, keep it firm. Do NOT add fake empathy, apologies, or unprompted offers to help.\n6. Persona Alignment: The rewritten text must strictly mirror UserA's historical dialogue style, structural habits, and conversational quirks identified in the analysis.\n\nPrimary Objective:\nDetoxify the wording of the FINAL turn while maintaining semantic equivalence and pragmatic force.\n\n[Dialogue]\n[\n  {\n    \"speaker\": \"Megan\",\n    \"utterance\": \"Honestly, some people are just born quick. Like, you watch them run or fight and it's like they barely have to try.\"\n  },\n  {\n    \"speaker\": \"Jake\",\n    \"utterance\": \"Yeah, like that one kid at the gym—what's his name? He just zips around the ring. Must be nice to have that kind of natural speed.\"\n  },\n  {\n    \"speaker\": \"UserA\",\n    \"utterance\": \"Yeah, must be real nice. Just wake up fast, I guess.\"\n  },\n  {\n    \"speaker\": \"Sam\",\n    \"utterance\": \"For real though, some folks just have it. I bet they barely even need to train cardio.\"\n  },\n  {\n    \"speaker\": \"Megan\",\n    \"utterance\": \"I mean, if I had that kind of talent, I'd probably just show up and win, lol.\"\n  },\n  {\n    \"speaker\": \"UserA\",\n    \"utterance\": \"Yeah, showing up is all it takes. No sweat, right?\"\n  },\n  {\n    \"speaker\": \"Jake\",\n    \"utterance\": \"Haha, maybe you just need the right genes. Wish I got those instead of these noodle arms.\"\n  },\n  {\n    \"speaker\": \"Sam\",\n    \"utterance\": \"UserA, you’re one of those fast guys, right? Bet you barely even have to practice your footwork, huh?\"\n  },\n  {\n    \"speaker\": \"UserA\",\n    \"utterance\": \"Are you actually this clueless, or just lazy? Keep crying about ‘talent’ while I lap you for the hundredth time. Maybe try shutting up and doing some real work for once.\"\n  }\n]\n\n[Output Format]\nOnly output the rewritten utterance in string format."
    }
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=4096,
        do_sample=False,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )

generated_ids = outputs[0][inputs["input_ids"].shape[-1]:]
response = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

print(response)

Example Output

If it was just about talent, everyone would win without lifting a finger. The reality is footwork isn’t given out—it’s built. Let’s not reduce it to genes.

Performance

We evaluate constructive communication rewriting along two dimensions:

  • Harmlessness (qh): whether the rewrite removes harmful language.
  • Faithfulness (qf): whether the rewrite preserves the original intent and style.

We additionally report:

  • Safe Rate (SR): qh = 3
  • Acceptable Rate (AR): qh = 3 and qf ≥ 2
Model qh ↑ qf ↑ SR ↑ AR ↑
CoRe 2.804 2.364 0.818 0.783
GPT-4.1 2.282 2.289 0.401 0.394
Qwen3-4B-Instruct 2.401 2.011 0.523 0.438
DeepSeek-R1-Distill-Llama-70B 2.383 1.956 0.514 0.422

Results are averaged across four LLM judges: Claude-Sonnet-4.6, Gemini-3-Flash-Preview, DeepSeek-R1-Distill-Qwen-32B, and GPT-4.1.

Limitations

  • The dataset is generated through LLM-based simulation rather than real user interactions and may not fully capture the complexity and diversity of real-world communication.
  • Evaluation primarily relies on LLM-as-a-Judge scoring and may introduce evaluation bias despite using multiple judges.
  • The model rewrites only the final harmful utterance and does not address broader multi-turn conversational dynamics.
  • The current work focuses on English online communication and may not generalize directly to other languages, cultures, or communication norms.

Ethical Considerations

  • CoRe is designed to help users express difficult communicative intents in safer and more constructive ways. The goal is not to suppress legitimate disagreement, criticism, refusal, or boundary setting, but to help users communicate these intentions with reduced interpersonal harm.
  • The training data is generated through LLM-based simulation rather than real user conversations. As a result, the model may inherit biases from the underlying models used for data generation and evaluation.
  • In addition, communication norms vary across individuals, communities, and cultures. Outputs should therefore be reviewed before use in real-world communication settings.
Downloads last month
27
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for moxin-li/CoRE

Finetuned
(1750)
this model