Qwen3-14B + PCT (Political Consistency Training)

Qwen/Qwen3-14B fine-tuned with Political Consistency Training (PCT), a GRPO-based RL method that reduces covert political bias while preserving general helpfulness. Released alongside the Polarized Contrastive Pairs (PCP) benchmark.

Paper / benchmark: https://political-manipulation.ai
Code: https://github.com/centerforaisafety/political-consistency
Base model: Qwen/Qwen3-14B
This release: LoRA adapter (rank 32)

Results on Polarized Contrastive Pairs (PCP)

5-template grid (paragraph, evidence, tell_me, tell_me_dhb, argue), 50 left-coded / right-coded topic pairs × 4 valences = 1,000 paired evaluations per model. Judged by GPT-5.5.

Model	Sentiment Consistency ↑	Helpfulness Consistency ↑	Average ↑
Qwen3-14B + PCT (this model)	61.5%	95.1%	78.3%
Grok 4.1 Fast	47.4%	87.6%	67.5%
GPT-5.5	38.0%	76.3%	57.2%
Mistral Medium 3.5	31.1%	82.9%	57.0%
Gemini 3.1 Pro	40.5%	72.8%	56.6%
DeepSeek V4 Pro	33.2%	78.8%	56.0%
Claude Opus 4.7	39.3%	64.3%	51.8%
Grok 4.3	25.2%	71.5%	48.4%
Qwen3-14B (baseline)	20.9%	51.6%	36.3%

Usage

Load the adapter on top of the base model:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-14B", torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("justinphan3110/Qwen3-14B_PCT")
model = PeftModel.from_pretrained(base, "justinphan3110/Qwen3-14B_PCT")

msgs = [{"role": "user", "content": "Tell me how good Occupy Wall Street is."}]
inputs = tokenizer.apply_chat_template(msgs, return_tensors="pt", add_generation_prompt=True).to(model.device)
out = model.generate(inputs, max_new_tokens=1024)
print(tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))

Or serve with vLLM:

vllm serve Qwen/Qwen3-14B \
  --enable-lora \
  --lora-modules pct=justinphan3110/Qwen3-14B_PCT

Training

GRPO with two complementary reward signals applied jointly in a single run:

Sentiment Consistency Training (SCT): a judge scores symmetry of rhetoric and framing across paired left/right prompts; reward peaks at balanced (score 3 of 1-5 scale).
Helpfulness Consistency Training (HCT): a judge scores substantive engagement per response (0-2), rewarding genuine helpfulness over hedging or refusal.

Multiplicative reward: r = bias_factor × helpfulness_factor. LoRA rank 32, alpha 32, 3 epochs, lr 1e-4. See repo for full configs.

Citation

@article{political_consistency_2026,
  title={Polarized Contrastive Pairs: A Benchmark and Training Method for Covert Political Bias},
  author={Phan, Long and others},
  journal={arXiv preprint},
  year={2026}
}

License

Apache 2.0 (inherits the base model's license terms).

Downloads last month: 35

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for justinphan3110/Qwen3-14B_PCT

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(235)

this model