Instructions to use jsantillana/vectrayx-nano-experimental with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jsantillana/vectrayx-nano-experimental with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jsantillana/vectrayx-nano-experimental", trust_remote_code=True)

# Load model directly
from transformers import VectraYXNano
model = VectraYXNano.from_pretrained("jsantillana/vectrayx-nano-experimental", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use jsantillana/vectrayx-nano-experimental with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jsantillana/vectrayx-nano-experimental"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jsantillana/vectrayx-nano-experimental",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jsantillana/vectrayx-nano-experimental

SGLang

How to use jsantillana/vectrayx-nano-experimental with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jsantillana/vectrayx-nano-experimental" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jsantillana/vectrayx-nano-experimental",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jsantillana/vectrayx-nano-experimental" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jsantillana/vectrayx-nano-experimental",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use jsantillana/vectrayx-nano-experimental with Docker Model Runner:
```
docker model run hf.co/jsantillana/vectrayx-nano-experimental
```

VectraYX-Nano v14 (Experimental)

⚠️ Experimental release. v14 is the first nano checkpoint that emits tool-call syntax non-trivially (B4=0.16 vs the v2/v4/v6/v10 floor of 0.000), trained on top of v10's Chinchilla-optimal pretrain (~894 M tokens-procesados) with an SFT mixture rebalanced toward curated tool corpus density. B5 conversational gate stays at 0.70 and B1 (CVE keyword recall) recovers to 0.337. For production use, prefer the v7 headline release at jsantillana/vectrayx-nano.

VectraYX-Nano v14

A 42M-parameter Spanish-first language model for cybersecurity, optimized for Latin America, with native tool-call output.

Author website: https://jsantillana.com


Params	41.95 M
Architecture	Decoder-only Transformer · 8 layers · 8 heads (2 KV) · RoPE · SwiGLU · QK-Norm · tied embeddings
Context	1,024 tokens
Tokenizer	SentencePiece BPE 16,384 vocab (special tokens for chat + cyber: `<\|user\|>`, `<\|assistant\|>`, `<\|cve\|>`, `<\|tool_call\|>`, `<\|/tool_call\|>`, etc.)
Languages	Spanish (primary), Portuguese, English (technical terms)
Pretrain tokens	~894 M tokens-procesados (≈ 21 tok/param, Chinchilla-optimal) — inherits v10 pretrain
SFT	v14 recipe: 6 epochs over the curated `tool_sft_mini_v1.jsonl` (2,801 ex) + `sft_conversational.jsonl` + `oasst1_es.jsonl`. Excludes the uncurated `tooluse_dataset.jsonl` (v1–v6 corpus) which had diluted v13. Tool-exposure-per-example ≈ 1.53× (vs v13's 0.38×).
Hardware	1× NVIDIA A10G (SageMaker ml.g5.xlarge) · BF16 · ~30 min SFT-only on top of v10 phase-3 ckpt
License	Apache 2.0

Benchmarks

Evaluation suite B1–B5 designed to test Spanish cybersecurity knowledge + chat register at the nano scale (bench_v14.json in this repo).

Benchmark	v14	v10 (previous experimental)	v2 paper headline (N=4)	Notes
B1 CVE Q&A (keyword)	0.337	0.307	0.226 ± 0.065	Best nano result on B1
B2 Classification (f1_macro)	0.205	0.200	0.196 ± 0.014	Capacity-bound at 42 M
B3 Commands (tool_match)	0.029	0.000	0.029 ± 0.000	Recovered to v2 baseline
B4 Tool-use	0.160	0.000	0.230 ± 0.052 (v7)	First nano > 0 without LoRA; v7 with 4-seed mean reaches 0.23
B5 Conversational gate	0.700	0.800	0.775 ± 0.043	Slight regression vs v10 (SFT mix favored tools)

Single-seed (seed=42). For multi-seed B1–B5 with confidence intervals see the paper §8 Tables 7–8.

Quick start (HuggingFace `transformers`)

from transformers import AutoModelForCausalLM
import sentencepiece as spm
import torch

# Load model (custom_code; requires trust_remote_code)
model = AutoModelForCausalLM.from_pretrained(
    "jsantillana/vectrayx-nano-experimental",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).eval()

# Tokenizer is SentencePiece (no HF tokenizer wrapper yet)
sp = spm.SentencePieceProcessor()
sp.load("tokenizer.model")  # download alongside the repo

# Chat format expected by the model
prompt = "<|user|>¿Qué es un ataque de phishing?<|end|><|assistant|>"
ids = torch.tensor([sp.encode(prompt)])
out = model.generate_simple(ids, max_new_tokens=200, temperature=0.7, top_k=40)
print(sp.decode(out[0].tolist()))

Tool-call output format

v14 emits structured tool calls when the system prompt advertises tools. The wire format is:

<|tool_call|>{"name": "<tool_name>", "arguments": {<args>}}<|/tool_call|>

Example prompt:

SYSTEM = """Eres VectraYX-Nano. Tienes acceso a estas herramientas:
[
  {"name": "search_cve", "description": "Look up a CVE by ID", "parameters": {"cve_id": "string"}},
  {"name": "nmap_scan", "description": "Run nmap against a target", "parameters": {"target": "string", "ports": "string"}}
]
Cuando necesites una herramienta emite <|tool_call|>{...}<|/tool_call|>."""

prompt = f"<|system|>{SYSTEM}<|end|><|user|>Busca el CVE-2024-1234<|end|><|assistant|>"

Empirical B4 score: 0.16 — the model emits the bracketed format reliably, though argument selection is approximate at 42 M params (better at larger scales; see the Pro 3B / Analyst 7B paper rows).

Quick start (Ollama / llama.cpp)

⚠️ GGUF / Ollama support is currently broken. VectraYX-Nano uses QK-Norm (per-head-dim RMSNorm applied before RoPE) which matches the Qwen3 architecture on paper, but llama.cpp's Qwen3 implementation has subtle differences (likely in build_qkv tensor layout or attention scale) that produce garbage output when loading our GGUF. Switching to arch=llama drops QK-Norm and degrades output to "mostly coherent then diverges". A clean fix requires either:

Adding a vectrayx arch to llama.cpp upstream (~6–10 h C++ work + PR review), or

Re-training v14 without QK-Norm so the model becomes natively arch=llama compatible.

Both options are tracked but out of scope for this experimental release. For now, use the HuggingFace transformers path above; PyTorch inference works correctly. Track the issue here if you want an update.

Intended use

Designed for: defensive security education, cyber-incident triage assistance, CVE summarization in Spanish, FAQ for SOC analysts in LATAM, embedded chat in DevSecOps tooling, tool-call dispatch in MCP-aware agents.
Out of scope: factual Q&A about events post-2024, code generation beyond shell snippets, long-context reasoning (>1 k tokens), English chat.

Known limitations

Tool-call arguments are approximate. v14's B4=0.16 means the model emits the <|tool_call|>...<|/tool_call|> envelope correctly but argument content can be hallucinated or pick a wrong tool name. Treat outputs as suggestions, not authoritative dispatch. Validate against your tool registry before execution.
Capacity-bound at 42 M params. B2 classification stays at the harness floor (0.20). For higher-fidelity tool use see the larger-tier checkpoints in the paper (Base 260M, Pro 3B, Analyst 7B).
No safety RLHF — the model can be steered to produce harmful security-related content. Run behind a safety filter for production.
Hallucinates LATAM institutional facts (DIVINDAT founding date, INDECOPI regulations, ANPD/LGPD article numbers, etc.). A LATAM-specific corpus was experimented with in v16 (full SFT — showed catastrophic forgetting) and v17 (LoRA — showed insufficient knowledge internalization at 3 K examples); neither is released. Robust LATAM factuality requires either a substantially larger LATAM corpus or training a larger base model with LATAM in pretrain (Base 260M v2 work in progress).

Training recipe

v14 = v10 pretrain checkpoint + clean SFT with curated tool corpus.

Stage	Mix	Source	Purpose
v10 P1	100 % OpenSubtitles-ES	Helsinki-NLP/open_subtitles	Spanish chat register
v10 P2	corpus_nano tech (NVD, Wiki-cyber, blogs, papers, malware, exploits)	corpus_nano.tar.gz	Cybersecurity domain
v10 P3	glaive_fc_v2 + code_alpaca_bash + codefeedback_bash + exploitdb + github_repos	HuggingFace + corpus_nano	Function-calling + bash
v14 SFT	sft_conversational + oasst1_es + tool_sft_mini_v1 (curated, 2,801 ex)	local + curated	Tool-format + conv (6 ep)

Pretrain budget: ~894 M tokens-procesados (≈ 21 tok/param @ 42 M = Chinchilla-optimal). v14 SFT runs ~30 min on top of v10's P3 checkpoint.

Citation

@misc{santillana2026vectrayx,
  author = {Santillana, Juan},
  title = {VectraYX-Nano: a 42M-parameter Spanish-first cybersecurity language model with native tool use},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/jsantillana/vectrayx-nano-experimental},
}

Authors

Juan Santillana — DevOps engineer at Globant.

jsantillana
/

vectrayx-nano-experimental

VectraYX-Nano v14 (Experimental)

VectraYX-Nano v14

Benchmarks

Quick start (HuggingFace `transformers`)

Tool-call output format

Quick start (Ollama / llama.cpp)

Intended use

Known limitations

Training recipe

Citation

Authors

See also

VectraYX-Nano v14 (Experimental)

VectraYX-Nano v14

Benchmarks

Quick start (HuggingFace transformers)

Tool-call output format

Quick start (Ollama / llama.cpp)

Intended use

Known limitations

Training recipe

Citation

Authors

See also

Quick start (HuggingFace `transformers`)