Instructions to use Quazim0t0/Escarda-86M-Identity with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Quazim0t0/Escarda-86M-Identity with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Quazim0t0/Escarda-86M-Identity", trust_remote_code=True)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Quazim0t0/Escarda-86M-Identity", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Quazim0t0/Escarda-86M-Identity with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Quazim0t0/Escarda-86M-Identity"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quazim0t0/Escarda-86M-Identity",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Quazim0t0/Escarda-86M-Identity

SGLang

How to use Quazim0t0/Escarda-86M-Identity with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Quazim0t0/Escarda-86M-Identity" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quazim0t0/Escarda-86M-Identity",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Quazim0t0/Escarda-86M-Identity" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quazim0t0/Escarda-86M-Identity",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Quazim0t0/Escarda-86M-Identity with Docker Model Runner:
```
docker model run hf.co/Quazim0t0/Escarda-86M-Identity
```

Escarda-86M-Identity

An identity-tuned chat variant of Escarda-86M (SFT epoch 3) — a ~86M-parameter SpikeWhaleLM model (JEPA + HRM refinement) with the custom ChatML-aware SpikeTokenizer. It knows it is "Escarda" and answers in a clean assistant style.

Usage

Custom architecture + tokenizer — load with trust_remote_code=True:

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("Quazim0t0/Escarda-86M-Identity", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Quazim0t0/Escarda-86M-Identity", trust_remote_code=True)

Architecture

Built on SpikeWhaleLM (~86M params, 16 layers, hidden 640, 4096 context, 16,512 vocab, tied embeddings): Multi-head Latent Attention (LoRA-rank-128 Q/O, decoupled RoPE-16 + NoPE-48, multi-query, QK-norm), an engram n-gram memory, ×2 hash-lookup layers, hyper-connections, HRM refinement, a Multi-Token-Prediction training head, and a JEPA (Joint-Embedding Predictive) auxiliary objective — the Escarda family uses both HRM refinement and JEPA (use_hrm_refine=True, use_jepa=True).

Tokenizer

SpikeTokenizer — a custom byte-level "length-max" (greedy longest-match) tokenizer with a 16,512-token vocab and ChatML-aware atomic special tokens. Ships as a PreTrainedTokenizer subclass and loads via AutoTokenizer + trust_remote_code.

Evaluation

Zero-shot, full validation/test splits (acc = raw continuation log-likelihood, acc_norm = byte-length-normalized).

Task	acc	acc_norm
ARC-Easy	0.3262	0.3380
ARC-Challenge	0.2048	0.2415
HellaSwag	0.2785	0.2818
WinoGrande	0.5020	—
PIQA	0.5539	0.5462
OpenBookQA	0.1360	0.2440
BoolQ	0.4174	—

ArithMark-2.0 (AxiomicLabs) — official metric is raw acc: 0.3628 (the strongest of the Escarda family).

Language modeling: WikiText-2 byte-ppl ↓ 2.7062 · BLiMP ↑ 0.7133.

Powers the live demo: Escarda-86M-Chat Space.

Citation

If you use this model, please cite:

@misc{escarda86midentity,
  title        = {Escarda-86M-Identity: A ~86M-parameter SpikeWhaleLM},
  author       = {Dean Byrne (Quazim0t0)},
  year         = {2026},
  howpublished = {HuggingFace, \url{https://huggingface.co/Quazim0t0/Escarda-86M-Identity}},
  note         = {Quazim0t0/Escarda-86M-Identity}
}

Downloads last month: 202

Safetensors

Model size

97.3M params

Tensor type

F32