Model Card for Model ID

This is a fine-tuned Italian model for Controllable Text Simplification. This is the Wikipedia-trained version on 25000 different original texts, each with multiple target simplifications, up to 10 different variations. The model has been trained on the Wikipedia split of IMPaCTS.

Model Details

The base model has been trained using LoRA with 16 bits precision.
To the base model Vocabulary, 20 new tokens have been addded which are used to control the target readability output. During the LoRA Training the base model embedding and unembedding layers have been left unfrozen to learn representations for these tokens.

lora_cfg = LoraConfig(
        r=32,
        lora_alpha=64,
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=[
            "q_proj", "k_proj", "v_proj", "o_proj",
            "gate_proj", "up_proj", "down_proj"
        ],
        modules_to_save=["embed_tokens", "lm_head"]
    )

Model Description

The twenty control token for targeting the output readability scores are:

<|readability_0|>
<|readability_5|>
<|readability_10|>
<|readability_15|>
<|readability_20|>
<|readability_25|>
<|readability_30|>
<|readability_35|>
<|readability_40|>
<|readability_45|>
<|readability_50|>
<|readability_55|>
<|readability_60|>
<|readability_65|>
<|readability_70|>
<|readability_75|>
<|readability_80|>
<|readability_85|>
<|readability_90|>
<|readability_95|>
<|readability_100|>

These tokens represent the target readability output that the models try to achieve. The structure of the input should be <|readability_target|>\n original_italian_sentence\n. The model will try to generate a simplification at the target readability, where a higher readability score means a more complex sentence. Aim for low readability values.

Model Sources [optional]

Uses

These models aim to be a simplification system for italian sentences, where a user can generate simplification at the aimed target readability of its intended reader. This can be useful for generating simplification for primary school student that have different reading-level competence, for people learning Italian, etc.

How to Get Started with the Model

This model can be simply used as follows:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42")

# If you need padding, ensure that these two lines are uncommented:
# tokenizer.pad_token = tokenizer.eos_token
# tokenizer.padding_side = "left"

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Base",
    device_map="auto",
)
model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8)
model.config.vocab_size = len(tokenizer)
model = PeftModel.from_pretrained(model, "mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42")

messages = []
text = f"<|readability_20|>\nProdotto dalla BBC, il film esce solo nel 1998 ed ottiene numerosi riconoscimenti internazionali, tra cui la candidatura al Premio Oscar per il miglior cortometraggio animato.\n"
messages.append(text)

pipe = pipeline(
        model=model,
        tokenizer=tokenizer,
        task='text-generation',
        max_new_tokens=128,
    )

sequences = pipe(messages)

print(sequences)

When providing the text, add the desired control token for readability as the first token of the sentence that needs to be simplified.

More Details

An extensive explanation of the model was trained and how it performs can be found in the LREC2026 Paper.

Citation

If you use any of these models, pleace cite:

 @inproceedings{papucci-etal-2026-controllable,
                    title = {Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on Automatically Generated Resources},
                    author = {Papucci, Michele and Venturi, Giulia and Dell'Orletta, Felice},
                    booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
                    month = {May},
                    year = {2026},
                    pages = {7178--7191},
                    address = {Palma, Mallorca, Spain},
                    publisher = {European Language Resources Association (ELRA)},
                    doi = {10.63317/5fgm358dfxt5},
                    abstract = {This paper presents a study on readability-controlled Sentence Simplification for Italian, addressing the scarcity of annotated resources for low-resource languages. We introduce IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification), the first fully automatically created corpus of 1,444,160 original–simple sentence pairs automatically annotated with readability levels and linguistic features. It was generated using an Italian LLM prompted in zero-shot to produce multiple simplifications per input sentence. Increasing portions of the resource are used to fine-tune mono- and multilingual open-weight LLMs, conditioning them to generate simplifications at a target readability level. Results from automatic and human evaluations show that fine-tuning on IMPaCTS improves performance both in terms of task completion and adherence to the targeted readability levels compared to few-shot baselines.}
                }
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42

Finetuned
(319)
this model

Dataset used to train mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42

Collection including mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42