Quark-270M Base — Bilingual Italian-English Language Model

Quark-270M Base is a compact bilingual language model for Italian and English, built entirely from scratch by ThingsAI. This is the raw pretrained model optimized for text completion. For conversational use, see Quark-270M-Instruct.

Model Details

Parameters 252M (with weight tying)
Architecture Decoder-only Transformer
Vocabulary 65,537 tokens (QuarkTokenizer, bilingual BPE)
Context Length 2,048 tokens
Precision BF16
Languages Italian, English
License Apache 2.0

Architecture

Component Details
Model Dimension 768
Layers 32
Attention Grouped Query Attention (GQA)
Query Heads 12
KV Heads 4 (3:1 ratio)
Head Dimension 64
FFN Dimension 2,048
FFN Activation SwiGLU
Normalization RMSNorm (pre-norm)
Positional Encoding RoPE (θ=10,000)
Weight Tying embed_tokens ↔ lm_head

Pretraining

Data

Trained on ~10B tokens from a curated bilingual mix:

Subset Weight Source
FineWeb-2 (Italian) 29% HuggingFaceFW/fineweb-2 [ita_Latn]
CulturaX (Italian) 14% uonlp/CulturaX [it]
Wikipedia (Italian) 7% wikimedia/wikipedia [20231101.it]
FineWeb (English) 36% HuggingFaceFW/fineweb [sample-10BT]
Wikipedia (English) 7% wikimedia/wikipedia [20231101.en]
The Stack (Code) 7% bigcode/the-stack-smol

Language split: Italian 50% · English 43% · Code 7%

Training Configuration

Hardware NVIDIA B200
Total Tokens ~10B
Batch Size 64 × 4 grad accum = 256 sequences
Sequence Length 2,048
Learning Rate 3e-4 → 3e-5 (cosine)
Warmup Steps 1,000
Optimizer AdamW (β₁=0.9, β₂=0.95)
Precision BF16 mixed precision
Throughput ~281k tokens/sec
Training Time ~10 hours

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ThingAI/Quark-270m-Base",
    trust_remote_code=True,
    torch_dtype="bfloat16"
).cuda()

tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-270m-Base")

inputs = tokenizer("L'Italia è un paese", return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7, top_k=40)
print(tokenizer.decode(out[0]))

Note: This is a base model for text completion. For chat and instructions, use Quark-270M-Instruct.

Limitations

  • Scale: 252M parameters limits factual knowledge and complex reasoning
  • Hallucination: Generates plausible but often incorrect information
  • Mathematics: Limited arithmetic capabilities
  • Code: Can produce syntactically plausible but often non-functional code

The Quark Family

Model Parameters Type
Quark-50M 51M Base
Quark-135M 135M Base
Quark-270M Base 252M Base
Quark-270M-Instruct 252M Chat

Links


Built from scratch by ThingsAI 🇮🇹

Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ThingAI/Quark-270m-Base