You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

IC-Scorer Q400 โ€” Integrative Complexity LoRA on Qwen3.5-27B

A LoRA adapter that scores English text on Integrative Complexity (IC), a 1โ€“7 measure of how many distinct perspectives a passage differentiates and how explicitly it integrates them. Trained with ORPO preference learning on the Jakob (2024) corpus (n=2,281) plus a small auxiliary set of anchor exemplars from the publicly downloadable IC scorer-training materials hosted by UBC (see NOTICE). Achieves ICC(2,1) = 0.797 on pooled 8-fold cross-validation against human IC ratings.

This is the ORPO LoRA used for the IC measurements in the paper "Text-measured cognitive complexity predicts belief revision in AI persuasion" (PsyArXiv preprint: https://osf.io/preprints/psyarxiv/mdxvs_v1).

License

The LoRA adapter weights and accompanying files are licensed under CC-BY-NC-4.0 โ€” see LICENSE. CC BY-NC 4.0 permits non-commercial use, including research, teaching, personal experimentation, and other uses not primarily intended for commercial advantage or monetary compensation.

Commercial uses are not granted under CC BY-NC 4.0. Contact the rights holder for a separate commercial license โ€” see COMMERCIAL.md.

The base model (unsloth/Qwen3.5-27B) is Apache 2.0 and is not redistributed here. The Jakob (2024) training corpus is CC-BY 4.0; see NOTICE for full third-party attribution.

Copyright ยฉ 2026 Tamas Madl. All rights not granted under CC BY-NC 4.0 or a separate written commercial license are reserved.

Intended use

The model scores texts, not people. A single text's IC score does not characterise the person who wrote it.

  • In-scope: scoring English text on IC for psychological / social-science research, persuasion / belief-change studies, computational text-analysis pipelines, classroom / replication exercises.
  • Out-of-scope:
    • individual psychological profiling
    • targeted persuasion or manipulation
    • ranking people by cognitive sophistication
    • surveillance, content-moderation, or platform-governance decisions
    • high-stakes evaluation of students, employees, applicants, defendants, patients, or other identified individuals
    • clinical or forensic assessment
    • hiring / selection decisions
    • downstream commercial products

The model is calibrated against the Suedfeld scoring tradition and Jakob coding scheme; transfer outside written English political/social discourse has not been validated.

How to use

This is a PEFT LoRA, not a standalone model. Loading is via unsloth's FastModel โ€” the same code path used for training and validation, which avoids quantisation-kernel drift.

pip install -U unsloth bitsandbytes accelerate
from inference_example import score_texts
ev_scores = score_texts(["Some passage to score.", "Another text."])
# โ†’ [4.12, 2.07]   floats in [1, 7]

Under the hood this calls:

from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name=ADAPTER_DIR,        # this repo โ€” auto-loads base + adapter
    max_seq_length=1024,
    load_in_4bit=True,
)
FastModel.for_inference(model)

The base model (unsloth/Qwen3.5-27B, Apache 2.0) is fetched on first use; โ‰ˆ17 GB on disk in 4-bit. Inference at 4-bit needs โ‰ฅ24 GB VRAM (batch=8, seq_len=1024).

The scoring head is logit-EV decoding: a single forward pass extracts logits at the last position over the seven score tokens "1" โ€ฆ "7", applies softmax, and returns the expected value. In our 8-fold CV, logit-EV improved ICC by approximately 0.02 over greedy argmax.

See inference_example.py for the full runnable example, including a CLI.

A vanilla transformers + peft + bitsandbytes loading path is technically possible but was not validated; scores may differ from the reported validation results.

Training

Base unsloth/Qwen3.5-27B (4-bit NF4 via bitsandbytes โ€” QLoRA)
Adapter LoRA r=16, ฮฑ=32, no dropout, target: q/k/v/o + gate/up/down_proj
Trainer trl.ORPOTrainer, ฮฒ=0.1, max_length=256, max_prompt_length=224
Optimizer AdamW, lr 5e-5, weight decay 0.01, warmup 0.05, cosine schedule
Effective batch 8 (per_device=8, grad_accum=1)
Steps 400

Per-fold artifacts (8-fold CV) trained on 8ร— RTX PRO 6000 Blackwell (96 GB) in ~31 min wall-clock.

Training data

Two human-scored English text sources:

source scale notes
Jakob (2024) 1โ€“6 social-media / forum posts; CC-BY 4.0 (n = 2,281)
IC scorer-training materials (UBC) 1โ€“7 short anchor passages from the publicly downloadable IC training materials โ€” see NOTICE for provenance and the chapter excluded by copyright

ORPO preference pairs are auto-generated: the chosen response is the human ground-truth IC score; rejected responses are the ordinal neighbours at distances 1 and 2 (clipped to 1..7). No model predictions are used as distractors. Class imbalance (heavy IC=1) is corrected by repeating each example by sqrt(max_count / class_count).

Evaluation

8-fold stratified CV on (IC ร— source). Each held-out example is scored by the adapter trained without it.

decoder group ICC(2,1) Pearson r Spearman ฯ
greedy overall 0.779 0.780 0.737
logit-EV overall 0.797 0.810 0.756
logit-EV jakob (forum) 0.775 0.788 0.740
logit-EV anchor exemplars (small subset) 0.671 0.757 0.762

Logit-EV consistently beats greedy argmax by ~0.02 ICC; we recommend the continuous channel for downstream regression / correlation work.

The 8-fold CV partitions are stratified on (IC ร— source) so that each held-out example is scored by an adapter that did not see it. There is no human-rated IC ground truth available for the downstream texts the model is applied to in our published belief-change analyses, so the values above are the only validation of the scorer against human ratings.

Limitations

  • Language: trained only on English text. No claims about other languages.
  • Domain: social-media / forum discourse + short anchor exemplars. Performance may degrade on highly technical or narrative text.
  • Length: truncated at 1024 tokens. Very long passages are scored on the truncated prefix.
  • Calibration: anchored to the Suedfeld 1โ€“7 scale; absolute scores should be interpreted relative to the training distribution, not as universal "complexity units".
  • Single-rater: the model outputs a single automated estimate per text. It should not be treated as a substitute for multiple trained human raters when consensus IC scores are required.

Reproducibility

Adapter shipped here corresponds to the model used for IC measurements in the paper. The full reproducibility pipeline (data prep, CV evaluation, scoring) lives at https://github.com/tmadl/UserAwareAISafety.

trainer_state.json and training_args.bin are included for transparency.

Citation

If you use this model, please cite the paper and the Jakob 2024 corpus:

@misc{madl2026icscorer,
  author       = {Madl, Tamas},
  title        = {Text-measured cognitive complexity predicts belief revision in AI persuasion},
  year         = {2026},
  howpublished = {PsyArXiv preprint},
  url          = {https://osf.io/preprints/psyarxiv/mdxvs_v1}
}

@misc{jakob2024ic,
  author       = {Jakob, Julia},
  title        = {The Integrative Complexity of Online User Comments
                  Across Different Types of Democracy and Discussion Arenas},
  year         = {2024},
  doi          = {10.17605/OSF.IO/NUQCJ},
  url          = {https://osf.io/nuqcj/overview}
}

Contact

Tamas Madl โ€” tamas.madl@ofai.at Austrian Research Institute for Artificial Intelligence (OFAI)

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tmadl/IC-Qwen3.5-ORPO-400

Base model

Qwen/Qwen3.5-27B
Adapter
(23)
this model