Escarda-86M-Identity

An identity-tuned chat variant of Escarda-86M (SFT epoch 3) β€” a ~86M-parameter SpikeWhaleLM model (JEPA + HRM refinement) with the custom ChatML-aware SpikeTokenizer. It knows it is "Escarda" and answers in a clean assistant style.

Usage

Custom architecture + tokenizer β€” load with trust_remote_code=True:

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("Quazim0t0/Escarda-86M-Identity", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Quazim0t0/Escarda-86M-Identity", trust_remote_code=True)

Prompts use the ChatML format (<|im_start|>role\n…<|im_end|>), with generation starting after a trailing <|im_start|>assistant\n.

Architecture

Built on SpikeWhaleLM (~86M params, 16 layers, hidden 640, 4096 context, 16,512 vocab, tied embeddings): Multi-head Latent Attention (LoRA-rank-128 Q/O, decoupled RoPE-16 + NoPE-48, multi-query, QK-norm), an engram n-gram memory, Γ—2 hash-lookup layers, hyper-connections, HRM refinement, a Multi-Token-Prediction training head, and a JEPA (Joint-Embedding Predictive) auxiliary objective β€” the Escarda family uses both HRM refinement and JEPA (use_hrm_refine=True, use_jepa=True).

Tokenizer

SpikeTokenizer β€” a custom byte-level "length-max" (greedy longest-match) tokenizer with a 16,512-token vocab and ChatML-aware atomic special tokens. Ships as a PreTrainedTokenizer subclass and loads via AutoTokenizer + trust_remote_code.

Evaluation

Zero-shot, full validation/test splits (acc = raw continuation log-likelihood, acc_norm = byte-length-normalized).

Task acc acc_norm
ARC-Easy 0.3262 0.3380
ARC-Challenge 0.2048 0.2415
HellaSwag 0.2785 0.2818
WinoGrande 0.5020 β€”
PIQA 0.5539 0.5462
OpenBookQA 0.1360 0.2440
BoolQ 0.4174 β€”

ArithMark-2.0 (AxiomicLabs) β€” official metric is raw acc: 0.3628 (the strongest of the Escarda family).

Language modeling: WikiText-2 byte-ppl ↓ 2.7062 Β· BLiMP ↑ 0.7133.

Powers the live demo: Escarda-86M-Chat Space.

Citation

If you use this model, please cite:

@misc{escarda86midentity,
  title        = {Escarda-86M-Identity: A ~86M-parameter SpikeWhaleLM},
  author       = {Dean Byrne (Quazim0t0)},
  year         = {2026},
  howpublished = {HuggingFace, \url{https://huggingface.co/Quazim0t0/Escarda-86M-Identity}},
  note         = {Quazim0t0/Escarda-86M-Identity}
}
Downloads last month
202
Safetensors
Model size
97.3M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using Quazim0t0/Escarda-86M-Identity 3