Nafie-473M

Nafie-473M is a compact Turkish causal language model based on a custom decoder-only Transformer architecture. It was developed with a Turkish-first design goal and evaluated on Turkish benchmark tasks, including the CETVEL benchmark suite. The model is mainly intended for Turkish text generation, instruction-following experiments, supervised fine-tuning research, and controlled local or Colab inference.

The model uses a custom architecture rather than a stock LLaMA, GPT-2, Mistral, or similar built-in Transformers architecture. It is packaged for Hugging Face through custom model code and supports loading with AutoModelForCausalLM.

Because Nafie-473M uses custom architecture code, it must be loaded with trust_remote_code=True.

Architecture Summary

Property	Value
Model name	Nafie-473M
Repository	`nafie-ai/nafie-473M`
Architecture	Custom decoder-only causal LM
Primary language	Turkish
Layers	36
Hidden size	1024
Attention heads	4
Context length	1024 tokens
Tokenizer	Custom BPE tokenizer
License	Apache-2.0
Framework	PyTorch + Hugging Face Transformers

The architecture includes RMSNorm, RoPE-style rotary positional embeddings, SwiGLU-style feed-forward blocks, and tied input/output embeddings.

CETVEL Benchmark

Nafie-473M was evaluated on CETVEL, a Turkish benchmark suite covering multiple task families. The table below reports task-aligned results on MCQA, NLI, QA, and TC.

Model	MCQA	NLI	QA	TC
Nafie-473M	44.22	34.03	11.00	37.95
`Kumru-7B`	57.64	37.42	16.30	63.39
`Llama-3.3-70B-Instruct`	60.70	37.10	23.97	63.73
`Kumru-2B`	39.69	37.97	6.50	47.57
`Trendyol/Llama-3-Trendyol-LLM-8b-chat-v2.0`	53.28	37.29	0.17	54.06
`Trendyol/Trendyol-LLM-7b-chat-v4.1.0`	54.94	35.71	0.34	52.12
`google/gemma-3-27b-it`	55.40	36.73	10.56	53.65
`google/gemma-3-12b-it`	52.66	34.93	10.26	54.38
`Qwen/Qwen2-72B-Instruct`	61.27	35.59	0.83	60.47
`CohereLabs/aya-expanse-32b`	52.47	35.93	0.67	50.67
`CohereLabs/aya-expanse-8b`	44.09	37.12	0.19	50.03
`google/gemma-3-4b-it`	42.33	31.11	8.22	46.15
`ytu-ce-cosmos/Turkish-Gemma-9b-v0.1`	51.85	32.68	0.11	46.97
`meta-llama/Llama-3.2-11B-Vision-Instruct`	45.66	37.49	4.37	47.88
`meta-llama/Llama-3.1-8B-Instruct`	45.77	38.99	3.30	46.51
`google/gemma-2-9b-it`	48.20	35.76	0.46	45.38
`ytu-ce-cosmos/turkish-gpt2-large-750m-instruct-v0.1`	35.20	37.60	0.28	52.77
`Qwen/Qwen2-7B-Instruct`	49.66	35.33	1.53	52.52
`meta-llama/Llama-3.2-3B-Instruct`	37.00	33.25	7.52	39.00

Quickstart

Install the required packages:

pip install -U "transformers[torch]" huggingface_hub safetensors

Load and run the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo_id = "nafie-ai/nafie-473M"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
)

prompt = "<s>Türkiye'nin başkenti neresidir?</s>"
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.3,
    top_k=3,
    top_p=0.95,
    repetition_penalty=1.2,
)
generated_ids = outputs[0][inputs["input_ids"].shape[-1]:]
print(tokenizer.decode(generated_ids, skip_special_tokens=True).strip())

Pipeline Usage

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="nafie-ai/nafie-473M",
    trust_remote_code=True,
    device_map="auto",
)

generator(
    "<s>Türkçe dil modelleri ne işe yarar?</s>",
    max_new_tokens=128,
    do_sample=True,
    temperature=0.3,
    top_k=3,
    top_p=0.95,
    repetition_penalty=1.2,
)

Recommended Generation Settings

For interactive Turkish generation, the following settings are a useful starting point:

{
    "max_new_tokens": 700,
    "do_sample": True,
    "temperature": 0.3,
    "top_k": 3,
    "top_p": 0.95,
    "repetition_penalty": 1.2,
}

Related SFT Dataset

Nafie-473M was developed together with a Turkish supervised fine-tuning dataset containing prompt-response pairs.

Related dataset:

nafie-ai/nafie-sft-v1

The dataset is intended for Turkish SFT, instruction-following, and prompt-response style training experiments.

License

Nafie-473M is released under the Apache License 2.0.

See the LICENSE file for the full license text.

Acknowledgements

The numerical calculations reported in this paper were fully performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).

Citation

If you use Nafie-473M, please cite the model repository:

Nafie-473M: A Turkish-focused decoder-only causal language model.
https://huggingface.co/nafie-ai/nafie-473M

If you use the related SFT dataset, please also cite:

Nafie SFT Dataset.
https://huggingface.co/datasets/nafie-ai/nafie-sft-v1

Downloads last month: 112

Safetensors

Model size

0.5B params

Tensor type

F32