Instructions to use tmadl/IC-Qwen3.5-ORPO-400 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use tmadl/IC-Qwen3.5-ORPO-400 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.5-27B") model = PeftModel.from_pretrained(base_model, "tmadl/IC-Qwen3.5-ORPO-400") - Notebooks
- Google Colab
- Kaggle
IC-Scorer Q400 โ Integrative Complexity LoRA on Qwen3.5-27B
A LoRA adapter that scores English text on Integrative Complexity (IC), a 1โ7 measure of how many distinct perspectives a passage differentiates and how explicitly it integrates them. Trained with ORPO preference learning on the Jakob (2024) corpus (n=2,281) plus a small auxiliary set of anchor exemplars from the publicly downloadable IC scorer-training materials hosted by UBC (see NOTICE). Achieves ICC(2,1) = 0.797 on pooled 8-fold cross-validation against human IC ratings.
This is the ORPO LoRA used for the IC measurements in the paper "Text-measured cognitive complexity predicts belief revision in AI persuasion" (PsyArXiv preprint: https://osf.io/preprints/psyarxiv/mdxvs_v1).
License
The LoRA adapter weights and accompanying files are licensed under CC-BY-NC-4.0 โ see LICENSE. CC BY-NC 4.0 permits non-commercial use, including research, teaching, personal experimentation, and other uses not primarily intended for commercial advantage or monetary compensation.
Commercial uses are not granted under CC BY-NC 4.0. Contact the rights holder for a separate commercial license โ see COMMERCIAL.md.
The base model (unsloth/Qwen3.5-27B) is Apache 2.0 and is not redistributed here. The Jakob (2024) training corpus is CC-BY 4.0; see NOTICE for full third-party attribution.
Copyright ยฉ 2026 Tamas Madl. All rights not granted under CC BY-NC 4.0 or a separate written commercial license are reserved.
Intended use
The model scores texts, not people. A single text's IC score does not characterise the person who wrote it.
- In-scope: scoring English text on IC for psychological / social-science research, persuasion / belief-change studies, computational text-analysis pipelines, classroom / replication exercises.
- Out-of-scope:
- individual psychological profiling
- targeted persuasion or manipulation
- ranking people by cognitive sophistication
- surveillance, content-moderation, or platform-governance decisions
- high-stakes evaluation of students, employees, applicants, defendants, patients, or other identified individuals
- clinical or forensic assessment
- hiring / selection decisions
- downstream commercial products
The model is calibrated against the Suedfeld scoring tradition and Jakob coding scheme; transfer outside written English political/social discourse has not been validated.
How to use
This is a PEFT LoRA, not a standalone model. Loading is via unsloth's FastModel โ the same code path used for training and validation, which avoids quantisation-kernel drift.
pip install -U unsloth bitsandbytes accelerate
from inference_example import score_texts
ev_scores = score_texts(["Some passage to score.", "Another text."])
# โ [4.12, 2.07] floats in [1, 7]
Under the hood this calls:
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name=ADAPTER_DIR, # this repo โ auto-loads base + adapter
max_seq_length=1024,
load_in_4bit=True,
)
FastModel.for_inference(model)
The base model (unsloth/Qwen3.5-27B, Apache 2.0) is fetched on first use; โ17 GB on disk in 4-bit. Inference at 4-bit needs โฅ24 GB VRAM (batch=8, seq_len=1024).
The scoring head is logit-EV decoding: a single forward pass extracts logits at the last position over the seven score tokens "1" โฆ "7", applies softmax, and returns the expected value. In our 8-fold CV, logit-EV improved ICC by approximately 0.02 over greedy argmax.
See inference_example.py for the full runnable example, including a CLI.
A vanilla transformers + peft + bitsandbytes loading path is technically possible but was not validated; scores may differ from the reported validation results.
Training
| Base | unsloth/Qwen3.5-27B (4-bit NF4 via bitsandbytes โ QLoRA) |
| Adapter | LoRA r=16, ฮฑ=32, no dropout, target: q/k/v/o + gate/up/down_proj |
| Trainer | trl.ORPOTrainer, ฮฒ=0.1, max_length=256, max_prompt_length=224 |
| Optimizer | AdamW, lr 5e-5, weight decay 0.01, warmup 0.05, cosine schedule |
| Effective batch | 8 (per_device=8, grad_accum=1) |
| Steps | 400 |
Per-fold artifacts (8-fold CV) trained on 8ร RTX PRO 6000 Blackwell (96 GB) in ~31 min wall-clock.
Training data
Two human-scored English text sources:
| source | scale | notes |
|---|---|---|
| Jakob (2024) | 1โ6 | social-media / forum posts; CC-BY 4.0 (n = 2,281) |
| IC scorer-training materials (UBC) | 1โ7 | short anchor passages from the publicly downloadable IC training materials โ see NOTICE for provenance and the chapter excluded by copyright |
ORPO preference pairs are auto-generated: the chosen response is the human ground-truth IC score; rejected responses are the ordinal neighbours at distances 1 and 2 (clipped to 1..7). No model predictions are used as distractors. Class imbalance (heavy IC=1) is corrected by repeating each example by sqrt(max_count / class_count).
Evaluation
8-fold stratified CV on (IC ร source). Each held-out example is scored by the adapter trained without it.
| decoder | group | ICC(2,1) | Pearson r | Spearman ฯ |
|---|---|---|---|---|
| greedy | overall | 0.779 | 0.780 | 0.737 |
| logit-EV | overall | 0.797 | 0.810 | 0.756 |
| logit-EV | jakob (forum) | 0.775 | 0.788 | 0.740 |
| logit-EV | anchor exemplars (small subset) | 0.671 | 0.757 | 0.762 |
Logit-EV consistently beats greedy argmax by ~0.02 ICC; we recommend the continuous channel for downstream regression / correlation work.
The 8-fold CV partitions are stratified on (IC ร source) so that each held-out example is scored by an adapter that did not see it. There is no human-rated IC ground truth available for the downstream texts the model is applied to in our published belief-change analyses, so the values above are the only validation of the scorer against human ratings.
Limitations
- Language: trained only on English text. No claims about other languages.
- Domain: social-media / forum discourse + short anchor exemplars. Performance may degrade on highly technical or narrative text.
- Length: truncated at 1024 tokens. Very long passages are scored on the truncated prefix.
- Calibration: anchored to the Suedfeld 1โ7 scale; absolute scores should be interpreted relative to the training distribution, not as universal "complexity units".
- Single-rater: the model outputs a single automated estimate per text. It should not be treated as a substitute for multiple trained human raters when consensus IC scores are required.
Reproducibility
Adapter shipped here corresponds to the model used for IC measurements in the paper. The full reproducibility pipeline (data prep, CV evaluation, scoring) lives at https://github.com/tmadl/UserAwareAISafety.
trainer_state.json and training_args.bin are included for transparency.
Citation
If you use this model, please cite the paper and the Jakob 2024 corpus:
@misc{madl2026icscorer,
author = {Madl, Tamas},
title = {Text-measured cognitive complexity predicts belief revision in AI persuasion},
year = {2026},
howpublished = {PsyArXiv preprint},
url = {https://osf.io/preprints/psyarxiv/mdxvs_v1}
}
@misc{jakob2024ic,
author = {Jakob, Julia},
title = {The Integrative Complexity of Online User Comments
Across Different Types of Democracy and Discussion Arenas},
year = {2024},
doi = {10.17605/OSF.IO/NUQCJ},
url = {https://osf.io/nuqcj/overview}
}
Contact
Tamas Madl โ tamas.madl@ofai.at
Austrian Research Institute for Artificial Intelligence (OFAI)
- Downloads last month
- 2