Pengin-compact-v0.1b

A compact, quantized extractive question-answering model optimized to run entirely in the browser via Transformers.js (WebAssembly / WebGPU). No server, no API key, no data egress.

This is the first public release from Pengin AI — purpose-built for on-device document intelligence.

What it does

Given a passage of text and a question, the model returns the exact answer span extracted directly from the source text, along with a confidence score and character offsets for highlighting.

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.2';

const qa = await pipeline('question-answering', 'penginlabs/Pengin-compact-v0.1b');

const result = await qa(
  'What is the net income?',
  'Net income for the period was $847 million, up 11.3% year-over-year.'
);
// → { answer: '$847 million', score: 0.97, start: 24, end: 36 }

Key properties

Property Value
Architecture DistilBERT (distilled)
Format Quantized ONNX (int8)
Size ~65 MB
Context window 512 tokens
Inference Browser-native (WebAssembly)
Task Extractive QA

Intended use

  • Financial document extraction
  • Contract and policy Q&A
  • Any scenario where source text is on-device and answers must be cited verbatim

Attribution

Weights and ONNX conversion based on Xenova/distilbert-base-cased-distilled-squad, which is itself derived from distilbert-base-cased-distilled-squad by Hugging Face. Original model licensed under Apache 2.0.

License

Apache 2.0

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for penginlabs/Pengin-compact-v0.1b

Quantized
(3)
this model