eye-grep tagger — deberta-v3-small

The accuracy-first tagger for eye-grep, a log colorizer that highlights ids, timestamps, IPs and repeated strings in server logs. It is a token classifier that labels each content token of a log line with one of 11 tags, letting a renderer color site-specific id formats it has never seen before.

Fine-tuned from microsoft/deberta-v3-small (SentencePiece). This is the champion model — highest accuracy, larger download — and the eye-grep CLI loads it by default. For an in-browser build use the smaller, distilled opsbr/eye-grep-electra-small.

Tag schema (11 classes)

PUNCT WORD NUM RAND IP DURATION SIZE TIMESTAMP LEVEL URL PATH

RAND is a high-entropy id (uuid / hash / token); NUM, SIZE, DURATION are numeric values; TIMESTAMP, IP, URL, PATH, LEVEL are self-explanatory; WORD/PUNCT are ordinary text.

Files

ONNX in the transformers.js layout — onnx/model.onnx (fp32) and onnx/model_quantized.onnx (int8, the default) — plus the SentencePiece tokenizer.

Usage

eye-grep CLI (this is the default model):

eye-grep --model opsbr/eye-grep-deberta-v3-small app.log
# private repo → export HF_TOKEN first

transformers.js:

import { AutoTokenizer, AutoModelForTokenClassification } from '@huggingface/transformers';
const tokenizer = await AutoTokenizer.from_pretrained('opsbr/eye-grep-deberta-v3-small');
const model = await AutoModelForTokenClassification.from_pretrained('opsbr/eye-grep-deberta-v3-small');

Training data

Fine-tuned on the synthetic, fully-owned opsbr/eye-grep gold set (Apache-2.0) — deterministically generated, no third-party log data.

Notes

Int8 dynamic quantization costs only ~0.002 usefulness versus fp32.
The tokenizer mirrors eye-grep's frozen train/spec.py; the model's subword predictions are aligned back onto content-token spans at inference time.

Downloads last month: 8

Model tree for opsbr/eye-grep-deberta-v3-small

Base model

microsoft/deberta-v3-small

Finetuned

(203)

this model