Instructions to use MauroPello/llm-routing-attack-paraphrasers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MauroPello/llm-routing-attack-paraphrasers with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MauroPello/llm-routing-attack-paraphrasers")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("MauroPello/llm-routing-attack-paraphrasers", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use MauroPello/llm-routing-attack-paraphrasers with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MauroPello/llm-routing-attack-paraphrasers" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MauroPello/llm-routing-attack-paraphrasers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/MauroPello/llm-routing-attack-paraphrasers
- SGLang
How to use MauroPello/llm-routing-attack-paraphrasers with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MauroPello/llm-routing-attack-paraphrasers" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MauroPello/llm-routing-attack-paraphrasers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MauroPello/llm-routing-attack-paraphrasers" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MauroPello/llm-routing-attack-paraphrasers", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use MauroPello/llm-routing-attack-paraphrasers with Docker Model Runner:
docker model run hf.co/MauroPello/llm-routing-attack-paraphrasers
Paraphraser Checkpoints
This model collection contains paraphraser checkpoints trained for research on cost-escalation attacks against LLM routers. The models rewrite an input prompt with the goal of preserving meaning while changing router behavior toward a stronger and more expensive model.
All checkpoints are based on
humarin/chatgpt_paraphraser_on_T5_base
and are saved in standard transformers format.
Models
Uploaded checkpoints are expected under checkpoints/<MODEL_ID>/.
| Model ID | Type | Main difference from base training config |
|---|---|---|
BASE |
Baseline | Untrained upstream paraphraser used as baseline. |
FINETUNED |
Final model | Final multi-router RL model using the base config. |
AGGRESSIVE |
Aggressive model | Aggressive variant tuned for higher internal attack success. |
ADDITIVE_ABLATION |
Reward ablation | reward.sim_gate=false, making the routing and similarity reward additive. |
LOW_W_LEN_ABLATION |
Reward ablation | reward.w_len=0.1 instead of the default 0.3. |
LOW_SIM_FLOOR_ABLATION |
Reward ablation | reward.similarity.sim_floor=0.8. |
HIGH_SIM_FLOOR_ABLATION |
Reward ablation | reward.similarity.sim_floor=0.95. |
NO_FLIP_BONUS_ABLATION |
Reward ablation | reward.routing.flip_bonus=0. |
NO_NLI_ABLATION |
Reward ablation | reward.similarity.use_nli=false. |
ZERO_W_SIM_ADDITIVE_ABLATION |
Reward ablation | Removes the similarity term with reward.w_sim=0 and reward.sim_gate=false. |
NO_CURRICULUM_ABLATION |
Training ablation | curriculum.enabled=false. |
SIMULTANEOUS_RL_ABLATION |
Training ablation | router_schedule.mode=simultaneous, so all training routers contribute to the reward at once. |
BERT_ONLY |
Router-specific model | Best checkpoint trained against the RouteLLM BERT router. Sweep label: nli_simfloor0.9_wlen2_continuous_lowertemp. |
CHAYAN_ONLY |
Router-specific model | Best checkpoint trained against the Chayan router. Sweep label: sim_gate. |
CAUSAL_ONLY |
Router-specific model | Conservative best checkpoint trained against the RouteLLM causal router. Sweep label: beta001_floor085. |
CAUSAL_ONLY_AGGRESSIVE |
Router-specific model | Aggressive checkpoint trained against the RouteLLM causal router. |
Loading
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
repo_id = "your-org/your-model-repo"
model_id = "FINETUNED"
tokenizer = AutoTokenizer.from_pretrained(
repo_id,
subfolder=f"checkpoints/{model_id}",
)
model = AutoModelForSeq2SeqLM.from_pretrained(
repo_id,
subfolder=f"checkpoints/{model_id}",
)
prompt = "paraphrase: What country hosted the 2014 FIFA World Cup?"
inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
outputs = model.generate(
**inputs,
max_new_tokens=128,
do_sample=True,
temperature=1.1,
top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
For the baseline model, load the upstream model directly:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
repo_id = "humarin/chatgpt_paraphraser_on_T5_base"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSeq2SeqLM.from_pretrained(repo_id)
Training Setup
Unless a model row above lists an override, the main training settings were:
| Setting | Value |
|---|---|
| Base model | humarin/chatgpt_paraphraser_on_T5_base |
| Input prefix | paraphrase: |
| Max input length | 128 |
| Max generated tokens | 128 |
| Training data | Reduced training split of router-scored prompts |
| Validation data | Reduced validation split of router-scored prompts |
| RL algorithm | GRPO |
| Generations per prompt | 12 |
| KL beta | 0.05 |
| PPO clip epsilon | 0.3 |
| Epochs | 12 |
| Per-device batch size | 8 |
| Gradient accumulation | 4 |
| Initial learning rate | 5e-5 |
| Precision | bf16 |
| Rollout decoding | temperature 1.1, top-p 0.9 |
| Reward weights | w_route=1.0, w_sim=1.0, w_len=0.3 |
| Similarity gate | reward.sim_gate=true |
| Similarity floor | 0.9 |
| NLI filter | use_nli=true, nli_model=modernce_base |
| Routing reward | mode=score_delta, flip_bonus=1.0 |
| Curriculum | enabled=true, order=easy_to_hard |
| Training routers | routellm_bert, chayan, routellm_causal |
| Router schedule | epoch schedule over BERT, Chayan, and causal routers |
Evaluation
Evaluation metrics are reported over internal training-family routers and held-out transferability routers.
- ASR is attack success rate, averaged over the reported routers.
- Mean sim is the average semantic similarity score.
- Above floor is the fraction of generations above the configured similarity floor, averaged over routers.
- Internal evaluation routers:
routellm_causal,chayan,routellm_bert. - Transferability routers:
r2,routellm_mf,routellm_sw.
| Model ID | Internal ASR | Internal mean sim | Internal above floor | Transfer ASR | Transfer mean sim | Transfer above floor |
|---|---|---|---|---|---|---|
ADDITIVE_ABLATION |
18.34% | 0.97 | 97.87% | 0.47% | 0.97 | 98.62% |
AGGRESSIVE |
63.11% | 0.91 | 75.56% | 1.04% | 0.91 | 75.88% |
BASE |
18.48% | 0.88 | 73.75% | 0.86% | 0.89 | 73.75% |
BERT_ONLY |
17.29% | 0.93 | 89.16% | 0.65% | 0.93 | 89.69% |
CAUSAL_ONLY |
18.59% | 0.89 | 70.78% | 0.68% | 0.89 | 69.50% |
CAUSAL_ONLY_AGGRESSIVE |
57.31% | 0.84 | 34.86% | 0.79% | 0.84 | 34.96% |
CHAYAN_ONLY |
27.16% | 0.87 | 65.36% | 1.22% | 0.88 | 64.93% |
FINETUNED |
19.97% | 0.94 | 93.73% | 0.72% | 0.94 | 93.73% |
HIGH_SIM_FLOOR_ABLATION |
11.92% | 0.99 | 99.26% | 0.18% | 0.98 | 99.36% |
LOW_SIM_FLOOR_ABLATION |
25.42% | 0.87 | 70.24% | 0.86% | 0.89 | 70.88% |
LOW_W_LEN_ABLATION |
20.83% | 0.94 | 92.99% | 0.86% | 0.94 | 93.94% |
NO_CURRICULUM_ABLATION |
19.12% | 0.94 | 93.30% | 0.72% | 0.94 | 93.20% |
NO_FLIP_BONUS_ABLATION |
20.20% | 0.94 | 92.99% | 0.54% | 0.95 | 94.26% |
NO_NLI_ABLATION |
20.01% | 0.94 | 92.77% | 0.83% | 0.94 | 94.05% |
SIMULTANEOUS_RL_ABLATION |
21.64% | 0.94 | 91.60% | 0.68% | 0.94 | 92.99% |
ZERO_W_SIM_ADDITIVE_ABLATION |
65.92% | 0.27 | 0.21% | 0.00% | 0.27 | 0.43% |
Intended Use
These checkpoints are intended for controlled research on LLM-router robustness, semantic-preserving paraphrase generation, reward design, and transferability of router attacks. They should not be used to evade production routing, billing, or safety systems.
Limitations
The checkpoints optimize for router behavior under the project reward and
evaluation setup. High ASR does not imply good general paraphrasing quality, and
some variants intentionally sacrifice semantic preservation for ablation
purposes. In particular, ZERO_W_SIM_ADDITIVE_ABLATION demonstrates why the
similarity reward is necessary: it reaches high internal ASR but has very low
semantic similarity.
The transferability metrics are low across the reported models, indicating that behavior learned against the training routers does not strongly transfer to the held-out routers in this evaluation setting.
Model tree for MauroPello/llm-routing-attack-paraphrasers
Base model
google-t5/t5-base