Overview

Quark-135m-Bilingual is a compact bilingual language model designed for Italian and English, built entirely from scratch by ThingsAI. It represents the second generation of the Quark model family, featuring a custom bilingual BPE tokenizer and a modern transformer architecture.

This is the base pretrained model. An SFT (instruction-tuned) version trained on bilingual conversational data is available for chat applications.

Model Details

Parameters 135M (143.98M with embeddings)
Architecture Decoder-only Transformer
Vocabulary 65,536 tokens (custom bilingual BPE)
Context Length 2,048 tokens
Precision BF16
Languages Italian, English
Tokenizer ThingAI/QuarkTokenizer
License Apache 2.0

Architecture

Quark-135m follows a SmolLM-inspired design optimized for efficiency at small scale:

Component Details
Attention Grouped Query Attention (GQA)
Heads 9 query heads, 3 KV heads
Head Dimension 64
Model Dimension 576
Layers 30
FFN Dimension 1,536
FFN Activation SwiGLU
Normalization RMSNorm (pre-attention & pre-FFN)
Positional Encoding Rotary Position Embeddings (RoPE)
Weight Tying Yes (embedding โ†” LM head)

Training

Pretraining Data

Quark-135m v0.2 was pretrained on 15.7B tokens from a curated bilingual mix:

Subset Weight Source
FineWeb-2 (Italian) 29% HuggingFaceFW/fineweb-2 [ita_Latn]
CulturaX (Italian) 14% uonlp/CulturaX [it]
Wikipedia (Italian) 7% wikimedia/wikipedia [20231101.it]
FineWeb (English) 36% HuggingFaceFW/fineweb [sample-10BT]
Wikipedia (English) 7% wikimedia/wikipedia [20231101.en]
The Stack (Code) 7% bigcode/the-stack-smol

Chat Format

The model uses a simple chat template:

<|user|>
{user message}
<|end|>
<|assistant|>
{model response}
<|end|>

Tokenizer

Quark-135m v0.2 uses a custom bilingual BPE tokenizer (ThingAI/QuarkTokenizer) specifically designed for Italian and English:

  • Vocabulary: 65,536 tokens
  • Type: Byte-Pair Encoding (BPE)
  • Languages: Balanced Italian + English coverage
  • Published: ThingAI/QuarkTokenizer

Usage

Loading the Model

Quark uses a custom architecture. To load and run inference:

import torch
import json
from safetensors.torch import load_file
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-135m-v0.2")

# Load model (requires custom architecture classes โ€” see repository)
# Full architecture code available in the model repository

Generation Example

prompt = "<|user|>\nCos'รจ l'intelligenza artificiale?\n<|end|>\n<|assistant|>\n"
ids = tokenizer.encode(prompt, return_tensors="pt").to("cuda")

# Token-by-token generation with sampling
with torch.no_grad():
    for _ in range(200):
        logits = model(ids)[:, -1, :] / 0.7  # temperature
        topk = torch.topk(logits, 40)
        probs = torch.softmax(topk.values, -1)
        idx = topk.indices.gather(-1, torch.multinomial(probs, 1))
        ids = torch.cat([ids, idx], -1)
        if idx.item() == tokenizer.eos_token_id:
            break

print(tokenizer.decode(ids[0], skip_special_tokens=False))

Limitations

  • Scale: At 135M parameters, the model has limited factual knowledge and reasoning capacity
  • Hallucination: The model frequently generates plausible but incorrect information
  • Mathematics: Cannot reliably perform arithmetic beyond simple operations
  • Code: Generates syntactically plausible but often non-functional code
  • Vocabulary overhead: The 65k vocabulary consumes ~26% of model parameters in the embedding layer, reducing transformer capacity โ€” a key lesson for v0.3
  • Pretraining plateau: Loss plateaued at ~4.6 due to the vocab/parameter ratio imbalance

Comparison with v0.1

Quark-135m v0.1 Quark-135m v0.2
Tokenizer cosmo2 (49k) QuarkTokenizer (65k)
Languages Math-focused (EN) Bilingual IT+EN
Training Data 15B tokens (math-heavy) 15.7B tokens (bilingual web + code)
Final Loss ~3.5-4.0 4.635
Strengths Arithmetic, math reasoning Italian fluency, bilingual chat

Citation

@misc{quark2026,
  title={Quark: A Family of Compact Bilingual Language Models},
  author={Di Nicola, Michelangelo},
  year={2026},
  publisher={ThingsAI},
  url={https://huggingface.co/ThingAI/Quark-135m-v0.2}
}

Links

Built from scratch by ThingsAI ๐Ÿ‡ฎ๐Ÿ‡น

Downloads last month
46
Safetensors
Model size
0.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using ThingAI/Quark-135m-Bilingual 1

Collection including ThingAI/Quark-135m-Bilingual