Instructions to use mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42

SGLang

How to use mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42 with Docker Model Runner:
```
docker model run hf.co/mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42
```

Model Card for Model ID

This is a fine-tuned Italian model for Controllable Text Simplification. This is the Wikipedia-trained version on 25000 different original texts, each with multiple target simplifications, up to 10 different variations. The model has been trained on the Wikipedia split of IMPaCTS.

Model Details

The base model has been trained using LoRA with 16 bits precision.
To the base model Vocabulary, 20 new tokens have been addded which are used to control the target readability output. During the LoRA Training the base model embedding and unembedding layers have been left unfrozen to learn representations for these tokens.

lora_cfg = LoraConfig(
        r=32,
        lora_alpha=64,
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=[
            "q_proj", "k_proj", "v_proj", "o_proj",
            "gate_proj", "up_proj", "down_proj"
        ],
        modules_to_save=["embed_tokens", "lm_head"]
    )

Model Description

The twenty control token for targeting the output readability scores are:

<|readability_0|>
<|readability_5|>
<|readability_10|>
<|readability_15|>
<|readability_20|>
<|readability_25|>
<|readability_30|>
<|readability_35|>
<|readability_40|>
<|readability_45|>
<|readability_50|>
<|readability_55|>
<|readability_60|>
<|readability_65|>
<|readability_70|>
<|readability_75|>
<|readability_80|>
<|readability_85|>
<|readability_90|>
<|readability_95|>
<|readability_100|>

These tokens represent the target readability output that the models try to achieve. The structure of the input should be <|readability_target|>\n original_italian_sentence\n. The model will try to generate a simplification at the target readability, where a higher readability score means a more complex sentence. Aim for low readability values.

Developed by: Michele Papucci
Language(s) (NLP): Italian
Finetuned from model: Qwen3-4B-Base

Model Sources [optional]

Data: IMPaCTS - Wikipedia Split
Paper Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on Automatically Generated Resources

Uses

These models aim to be a simplification system for italian sentences, where a user can generate simplification at the aimed target readability of its intended reader. This can be useful for generating simplification for primary school student that have different reading-level competence, for people learning Italian, etc.

How to Get Started with the Model

This model can be simply used as follows:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42")

# If you need padding, ensure that these two lines are uncommented:
# tokenizer.pad_token = tokenizer.eos_token
# tokenizer.padding_side = "left"

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Base",
    device_map="auto",
)
model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8)
model.config.vocab_size = len(tokenizer)
model = PeftModel.from_pretrained(model, "mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42")

messages = []
text = f"<|readability_20|>\nProdotto dalla BBC, il film esce solo nel 1998 ed ottiene numerosi riconoscimenti internazionali, tra cui la candidatura al Premio Oscar per il miglior cortometraggio animato.\n"
messages.append(text)

pipe = pipeline(
        model=model,
        tokenizer=tokenizer,
        task='text-generation',
        max_new_tokens=128,
    )

sequences = pipe(messages)

print(sequences)

When providing the text, add the desired control token for readability as the first token of the sentence that needs to be simplified.

More Details

An extensive explanation of the model was trained and how it performs can be found in the LREC2026 Paper.

Citation

If you use any of these models, pleace cite:

 @inproceedings{papucci-etal-2026-controllable,
                    title = {Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on Automatically Generated Resources},
                    author = {Papucci, Michele and Venturi, Giulia and Dell'Orletta, Felice},
                    booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
                    month = {May},
                    year = {2026},
                    pages = {7178--7191},
                    address = {Palma, Mallorca, Spain},
                    publisher = {European Language Resources Association (ELRA)},
                    doi = {10.63317/5fgm358dfxt5},
                    abstract = {This paper presents a study on readability-controlled Sentence Simplification for Italian, addressing the scarcity of annotated resources for low-resource languages. We introduce IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification), the first fully automatically created corpus of 1,444,160 original–simple sentence pairs automatically annotated with readability levels and linguistic features. It was generated using an Italian LLM prompted in zero-shot to produce multiple simplifications per input sentence. Increasing portions of the resource are used to fine-tune mono- and multilingual open-weight LLMs, conditioning them to generate simplifications at a target readability level. Results from automatic and human evaluations show that fine-tuning on IMPaCTS improves performance both in terms of task completion and adherence to the targeted readability levels compared to few-shot baselines.}
                }

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42

Base model

Qwen/Qwen3-4B-Base

Finetuned

(319)

this model

Dataset used to train mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42

Collection including mpapucci/Qwen3-4B-Wikipedia-Controllable-Text-Simplification-25000-42

Qwen Italian Controlled Text Simplification - Wikipedia

Collection

Qwen Model fine-tuned for Italian Controlled Text Simplification for Wikipedia • 4 items • Updated 2 days ago