You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

IC-Scorer Q400 — Integrative Complexity LoRA on Qwen3.5-27B

A LoRA adapter that scores English text on Integrative Complexity (IC), a 1–7 measure of how many distinct perspectives a passage differentiates and how explicitly it integrates them. Trained with ORPO preference learning on the Jakob (2024) corpus (n=2,281) plus a small auxiliary set of anchor exemplars from the publicly downloadable IC scorer-training materials hosted by UBC (see NOTICE). Achieves ICC(2,1) = 0.797 on pooled 8-fold cross-validation against human IC ratings.

This is the ORPO LoRA used for the IC measurements in the paper "Text-measured cognitive complexity predicts belief revision in AI persuasion" (PsyArXiv preprint: https://osf.io/preprints/psyarxiv/mdxvs_v1).

License

The LoRA adapter weights and accompanying files are licensed under CC-BY-NC-4.0 — see LICENSE. CC BY-NC 4.0 permits non-commercial use, including research, teaching, personal experimentation, and other uses not primarily intended for commercial advantage or monetary compensation.

Commercial uses are not granted under CC BY-NC 4.0. Contact the rights holder for a separate commercial license — see COMMERCIAL.md.

The base model (unsloth/Qwen3.5-27B) is Apache 2.0 and is not redistributed here. The Jakob (2024) training corpus is CC-BY 4.0; see NOTICE for full third-party attribution.

Intended use

The model scores texts, not people. A single text's IC score does not characterise the person who wrote it.

In-scope: scoring English text on IC for psychological / social-science research, persuasion / belief-change studies, computational text-analysis pipelines, classroom / replication exercises.
Out-of-scope:
- individual psychological profiling
- targeted persuasion or manipulation
- ranking people by cognitive sophistication
- surveillance, content-moderation, or platform-governance decisions
- high-stakes evaluation of students, employees, applicants, defendants, patients, or other identified individuals
- clinical or forensic assessment
- hiring / selection decisions
- downstream commercial products

The model is calibrated against the Suedfeld scoring tradition and Jakob coding scheme; transfer outside written English political/social discourse has not been validated.

How to use

This is a PEFT LoRA, not a standalone model. Loading is via unsloth's FastModel — the same code path used for training and validation, which avoids quantisation-kernel drift.

pip install -U unsloth bitsandbytes accelerate

from inference_example import score_texts
ev_scores = score_texts(["Some passage to score.", "Another text."])
# → [4.12, 2.07]   floats in [1, 7]

Under the hood this calls:

from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name=ADAPTER_DIR,        # this repo — auto-loads base + adapter
    max_seq_length=1024,
    load_in_4bit=True,
)
FastModel.for_inference(model)

The base model (unsloth/Qwen3.5-27B, Apache 2.0) is fetched on first use; ≈17 GB on disk in 4-bit. Inference at 4-bit needs ≥24 GB VRAM (batch=8, seq_len=1024).

The scoring head is logit-EV decoding: a single forward pass extracts logits at the last position over the seven score tokens "1" … "7", applies softmax, and returns the expected value. In our 8-fold CV, logit-EV improved ICC by approximately 0.02 over greedy argmax.

See inference_example.py for the full runnable example, including a CLI.

A vanilla transformers + peft + bitsandbytes loading path is technically possible but was not validated; scores may differ from the reported validation results.

Training


Base	`unsloth/Qwen3.5-27B` (4-bit NF4 via bitsandbytes — QLoRA)
Adapter	LoRA `r=16, α=32`, no dropout, target: `q/k/v/o + gate/up/down_proj`
Trainer	`trl.ORPOTrainer`, β=0.1, max_length=256, max_prompt_length=224
Optimizer	AdamW, lr 5e-5, weight decay 0.01, warmup 0.05, cosine schedule
Effective batch	8 (per_device=8, grad_accum=1)
Steps	400

Per-fold artifacts (8-fold CV) trained on 8× RTX PRO 6000 Blackwell (96 GB) in ~31 min wall-clock.

Training data

Two human-scored English text sources:

source	scale	notes
Jakob (2024)	1–6	social-media / forum posts; CC-BY 4.0 (n = 2,281)
IC scorer-training materials (UBC)	1–7	short anchor passages from the publicly downloadable IC training materials — see `NOTICE` for provenance and the chapter excluded by copyright

ORPO preference pairs are auto-generated: the chosen response is the human ground-truth IC score; rejected responses are the ordinal neighbours at distances 1 and 2 (clipped to 1..7). No model predictions are used as distractors. Class imbalance (heavy IC=1) is corrected by repeating each example by sqrt(max_count / class_count).

Evaluation

8-fold stratified CV on (IC × source). Each held-out example is scored by the adapter trained without it.

decoder	group	ICC(2,1)	Pearson r	Spearman ρ
greedy	overall	0.779	0.780	0.737
logit-EV	overall	0.797	0.810	0.756
logit-EV	jakob (forum)	0.775	0.788	0.740
logit-EV	anchor exemplars (small subset)	0.671	0.757	0.762

Logit-EV consistently beats greedy argmax by ~0.02 ICC; we recommend the continuous channel for downstream regression / correlation work.

The 8-fold CV partitions are stratified on (IC × source) so that each held-out example is scored by an adapter that did not see it. There is no human-rated IC ground truth available for the downstream texts the model is applied to in our published belief-change analyses, so the values above are the only validation of the scorer against human ratings.

Limitations

Language: trained only on English text. No claims about other languages.
Domain: social-media / forum discourse + short anchor exemplars. Performance may degrade on highly technical or narrative text.
Length: truncated at 1024 tokens. Very long passages are scored on the truncated prefix.
Calibration: anchored to the Suedfeld 1–7 scale; absolute scores should be interpreted relative to the training distribution, not as universal "complexity units".
Single-rater: the model outputs a single automated estimate per text. It should not be treated as a substitute for multiple trained human raters when consensus IC scores are required.

Reproducibility

Adapter shipped here corresponds to the model used for IC measurements in the paper. The full reproducibility pipeline (data prep, CV evaluation, scoring) lives at https://github.com/tmadl/UserAwareAISafety.

trainer_state.json and training_args.bin are included for transparency.

Citation

If you use this model, please cite the paper and the Jakob 2024 corpus:

@misc{madl2026icscorer,
  author       = {Madl, Tamas},
  title        = {Text-measured cognitive complexity predicts belief revision in AI persuasion},
  year         = {2026},
  howpublished = {PsyArXiv preprint},
  url          = {https://osf.io/preprints/psyarxiv/mdxvs_v1}
}

@misc{jakob2024ic,
  author       = {Jakob, Julia},
  title        = {The Integrative Complexity of Online User Comments
                  Across Different Types of Democracy and Discussion Arenas},
  year         = {2024},
  doi          = {10.17605/OSF.IO/NUQCJ},
  url          = {https://osf.io/nuqcj/overview}
}

Contact

Tamas Madl — tamas.madl@ofai.at Austrian Research Institute for Artificial Intelligence (OFAI)

Downloads last month: 2

Model tree for tmadl/IC-Qwen3.5-ORPO-400

Base model

Qwen/Qwen3.5-27B

Finetuned

unsloth/Qwen3.5-27B

Adapter

(23)

this model