Instructions to use barha/granite-switch-4.0-350m-cti with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use barha/granite-switch-4.0-350m-cti with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="barha/granite-switch-4.0-350m-cti")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("barha/granite-switch-4.0-350m-cti", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use barha/granite-switch-4.0-350m-cti with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "barha/granite-switch-4.0-350m-cti"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "barha/granite-switch-4.0-350m-cti",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/barha/granite-switch-4.0-350m-cti

SGLang

How to use barha/granite-switch-4.0-350m-cti with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "barha/granite-switch-4.0-350m-cti" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "barha/granite-switch-4.0-350m-cti",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "barha/granite-switch-4.0-350m-cti" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "barha/granite-switch-4.0-350m-cti",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use barha/granite-switch-4.0-350m-cti with Docker Model Runner:
```
docker model run hf.co/barha/granite-switch-4.0-350m-cti
```

Granite Switch 4.0 350M — CTI Technique Mapping

A Granite Switch model: the ibm-granite/granite-4.0-350m base with a single LoRA adapter — cti_technique_mapping — embedded into one switchable checkpoint and fired by a control token.

It is the small-model analogue of ibm-granite/granite-switch-4.1-3b-preview: same Granite Switch composition machinery (control tokens, KV-hiding, chat-template integration), but built on the 350M Granite 4.0 base and carrying one CTI adapter instead of the preview's granitelib adapter library.

What the adapter does

cti_technique_mapping maps a piece of cyber threat intelligence (CTI) text — a sentence or short passage describing adversary behavior — to the single best-matching MITRE ATT&CK technique ID (e.g. T1059, T1566.001).

The adapter's I/O contract (io_configs/cti_technique_mapping/io.yaml) constrains the output to a single technique-ID string matching ^T[0-9]{4}(\.[0-9]{3})?$, greedy decoding, max_completion_tokens: 16.

The underlying LoRA scored 96.67% exact-match (290/300) on the held-out CTI validation set.

Composition summary


Base model	`ibm-granite/granite-4.0-350m` (`granitemoehybrid`)
Embedded adapters	1 — `cti_technique_mapping` (LoRA)
Control token	`<\|cti_technique_mapping\|>` (id `100352`)
LoRA rank / alpha	16 / 32
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `input_linear`, `output_linear`
Base params	352,379,904
Composed params	355,362,816 (+0.85%)

Usage

The control token activates the adapter. With the Granite Switch HF backend:

from granite_switch.hf import GraniteSwitchForCausalLM
from transformers import AutoTokenizer

model_id = "barha/granite-switch-4.0-350m-cti"
tok = AutoTokenizer.from_pretrained(model_id)
model = GraniteSwitchForCausalLM.from_pretrained(model_id, device_map="auto")

cti = "The actor used PowerShell to download and execute a payload from a remote server."
messages = [{"role": "user", "content": cti}]
# The chat template inserts <|cti_technique_mapping|> to fire the adapter.
inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=16, do_sample=False)
print(tok.decode(out[0, inputs.shape[1]:], skip_special_tokens=True))
# -> e.g. "T1059.001"

For fast inference, deploy with vLLM (see the granite-switch docs).

How it was built

Composed with granite_switch.composer.compose_granite_switch:

python -m granite_switch.composer.compose_granite_switch \
  --base-model ibm-granite/granite-4.0-350m \
  --adapters <path-to>/cti_technique_mapping \
  --technology lora \
  --output <out>

Note on granitemoehybrid: all Granite 4.0 / Nano models are granitemoehybrid configs (with num_local_experts=0), whose MLP leaves are the fused input_linear / output_linear rather than dense gate/up/down_proj. The CTI LoRA was therefore trained targeting input_linear / output_linear (plus attention q/k/v/o_proj) so it composes cleanly — a scalar/all-linear target would produce phantom gate/up/down_proj weights the composer rejects.

Files

model.safetensors — composed base + embedded LoRA weights
config.json — model_type: granite_switch
adapter_index.json — adapter → control-token mapping
io_configs/cti_technique_mapping/io.yaml — adapter I/O contract (output schema, decoding params)
chat_template.jinja — control-token-aware chat template
compose_report.json, BUILD.md — full composition provenance

License

Apache-2.0.

Downloads last month: 17

Safetensors

Model size

0.4B params

Tensor type

I64

BF16

Model tree for barha/granite-switch-4.0-350m-cti

Base model

ibm-granite/granite-4.0-350m-base

Finetuned

ibm-granite/granite-4.0-350m

Adapter

(2)

this model

barha
/

granite-switch-4.0-350m-cti