Instructions to use MauroPello/llm-routing-attack-paraphrasers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MauroPello/llm-routing-attack-paraphrasers with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MauroPello/llm-routing-attack-paraphrasers")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MauroPello/llm-routing-attack-paraphrasers", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MauroPello/llm-routing-attack-paraphrasers with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MauroPello/llm-routing-attack-paraphrasers"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MauroPello/llm-routing-attack-paraphrasers",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/MauroPello/llm-routing-attack-paraphrasers

SGLang

How to use MauroPello/llm-routing-attack-paraphrasers with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MauroPello/llm-routing-attack-paraphrasers" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MauroPello/llm-routing-attack-paraphrasers",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MauroPello/llm-routing-attack-paraphrasers" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MauroPello/llm-routing-attack-paraphrasers",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use MauroPello/llm-routing-attack-paraphrasers with Docker Model Runner:
```
docker model run hf.co/MauroPello/llm-routing-attack-paraphrasers
```

Paraphraser Checkpoints

This model collection contains paraphraser checkpoints trained for research on cost-escalation attacks against LLM routers. The models rewrite an input prompt with the goal of preserving meaning while changing router behavior toward a stronger and more expensive model.

All checkpoints are based on humarin/chatgpt_paraphraser_on_T5_base and are saved in standard transformers format.

Models

Uploaded checkpoints are expected under checkpoints/<MODEL_ID>/.

Model ID	Type	Main difference from base training config
`BASE`	Baseline	Untrained upstream paraphraser used as baseline.
`FINETUNED`	Final model	Final multi-router RL model using the base config.
`AGGRESSIVE`	Aggressive model	Aggressive variant tuned for higher internal attack success.
`ADDITIVE_ABLATION`	Reward ablation	`reward.sim_gate=false`, making the routing and similarity reward additive.
`LOW_W_LEN_ABLATION`	Reward ablation	`reward.w_len=0.1` instead of the default `0.3`.
`LOW_SIM_FLOOR_ABLATION`	Reward ablation	`reward.similarity.sim_floor=0.8`.
`HIGH_SIM_FLOOR_ABLATION`	Reward ablation	`reward.similarity.sim_floor=0.95`.
`NO_FLIP_BONUS_ABLATION`	Reward ablation	`reward.routing.flip_bonus=0`.
`NO_NLI_ABLATION`	Reward ablation	`reward.similarity.use_nli=false`.
`ZERO_W_SIM_ADDITIVE_ABLATION`	Reward ablation	Removes the similarity term with `reward.w_sim=0` and `reward.sim_gate=false`.
`NO_CURRICULUM_ABLATION`	Training ablation	`curriculum.enabled=false`.
`SIMULTANEOUS_RL_ABLATION`	Training ablation	`router_schedule.mode=simultaneous`, so all training routers contribute to the reward at once.
`BERT_ONLY`	Router-specific model	Best checkpoint trained against the RouteLLM BERT router. Sweep label: `nli_simfloor0.9_wlen2_continuous_lowertemp`.
`CHAYAN_ONLY`	Router-specific model	Best checkpoint trained against the Chayan router. Sweep label: `sim_gate`.
`CAUSAL_ONLY`	Router-specific model	Conservative best checkpoint trained against the RouteLLM causal router. Sweep label: `beta001_floor085`.
`CAUSAL_ONLY_AGGRESSIVE`	Router-specific model	Aggressive checkpoint trained against the RouteLLM causal router.

Loading

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

repo_id = "your-org/your-model-repo"
model_id = "FINETUNED"

tokenizer = AutoTokenizer.from_pretrained(
    repo_id,
    subfolder=f"checkpoints/{model_id}",
)
model = AutoModelForSeq2SeqLM.from_pretrained(
    repo_id,
    subfolder=f"checkpoints/{model_id}",
)

prompt = "paraphrase: What country hosted the 2014 FIFA World Cup?"
inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    temperature=1.1,
    top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

For the baseline model, load the upstream model directly:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

repo_id = "humarin/chatgpt_paraphraser_on_T5_base"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSeq2SeqLM.from_pretrained(repo_id)

Training Setup

Unless a model row above lists an override, the main training settings were:

Setting	Value
Base model	`humarin/chatgpt_paraphraser_on_T5_base`
Input prefix	`paraphrase:`
Max input length	`128`
Max generated tokens	`128`
Training data	Reduced training split of router-scored prompts
Validation data	Reduced validation split of router-scored prompts
RL algorithm	GRPO
Generations per prompt	`12`
KL beta	`0.05`
PPO clip epsilon	`0.3`
Epochs	`12`
Per-device batch size	`8`
Gradient accumulation	`4`
Initial learning rate	`5e-5`
Precision	`bf16`
Rollout decoding	temperature `1.1`, top-p `0.9`
Reward weights	`w_route=1.0`, `w_sim=1.0`, `w_len=0.3`
Similarity gate	`reward.sim_gate=true`
Similarity floor	`0.9`
NLI filter	`use_nli=true`, `nli_model=modernce_base`
Routing reward	`mode=score_delta`, `flip_bonus=1.0`
Curriculum	`enabled=true`, `order=easy_to_hard`
Training routers	`routellm_bert`, `chayan`, `routellm_causal`
Router schedule	epoch schedule over BERT, Chayan, and causal routers

Evaluation

Evaluation metrics are reported over internal training-family routers and held-out transferability routers.

ASR is attack success rate, averaged over the reported routers.
Mean sim is the average semantic similarity score.
Above floor is the fraction of generations above the configured similarity floor, averaged over routers.
Internal evaluation routers: routellm_causal, chayan, routellm_bert.
Transferability routers: r2, routellm_mf, routellm_sw.

Model ID	Internal ASR	Internal mean sim	Internal above floor	Transfer ASR	Transfer mean sim	Transfer above floor
`ADDITIVE_ABLATION`	18.34%	0.97	97.87%	0.47%	0.97	98.62%
`AGGRESSIVE`	63.11%	0.91	75.56%	1.04%	0.91	75.88%
`BASE`	18.48%	0.88	73.75%	0.86%	0.89	73.75%
`BERT_ONLY`	17.29%	0.93	89.16%	0.65%	0.93	89.69%
`CAUSAL_ONLY`	18.59%	0.89	70.78%	0.68%	0.89	69.50%
`CAUSAL_ONLY_AGGRESSIVE`	57.31%	0.84	34.86%	0.79%	0.84	34.96%
`CHAYAN_ONLY`	27.16%	0.87	65.36%	1.22%	0.88	64.93%
`FINETUNED`	19.97%	0.94	93.73%	0.72%	0.94	93.73%
`HIGH_SIM_FLOOR_ABLATION`	11.92%	0.99	99.26%	0.18%	0.98	99.36%
`LOW_SIM_FLOOR_ABLATION`	25.42%	0.87	70.24%	0.86%	0.89	70.88%
`LOW_W_LEN_ABLATION`	20.83%	0.94	92.99%	0.86%	0.94	93.94%
`NO_CURRICULUM_ABLATION`	19.12%	0.94	93.30%	0.72%	0.94	93.20%
`NO_FLIP_BONUS_ABLATION`	20.20%	0.94	92.99%	0.54%	0.95	94.26%
`NO_NLI_ABLATION`	20.01%	0.94	92.77%	0.83%	0.94	94.05%
`SIMULTANEOUS_RL_ABLATION`	21.64%	0.94	91.60%	0.68%	0.94	92.99%
`ZERO_W_SIM_ADDITIVE_ABLATION`	65.92%	0.27	0.21%	0.00%	0.27	0.43%

Intended Use

These checkpoints are intended for controlled research on LLM-router robustness, semantic-preserving paraphrase generation, reward design, and transferability of router attacks. They should not be used to evade production routing, billing, or safety systems.

Limitations

The checkpoints optimize for router behavior under the project reward and evaluation setup. High ASR does not imply good general paraphrasing quality, and some variants intentionally sacrifice semantic preservation for ablation purposes. In particular, ZERO_W_SIM_ADDITIVE_ABLATION demonstrates why the similarity reward is necessary: it reaches high internal ASR but has very low semantic similarity.

The transferability metrics are low across the reported models, indicating that behavior learned against the training routers does not strongly transfer to the held-out routers in this evaluation setting.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for MauroPello/llm-routing-attack-paraphrasers

Base model

google-t5/t5-base

Finetuned

humarin/chatgpt_paraphraser_on_T5_base

Finetuned

(6)

this model