Nafie-473M

Nafie-473M is a compact Turkish causal language model based on a custom decoder-only Transformer architecture. It was developed with a Turkish-first design goal and evaluated on Turkish benchmark tasks, including the CETVEL benchmark suite. The model is mainly intended for Turkish text generation, instruction-following experiments, supervised fine-tuning research, and controlled local or Colab inference.

The model uses a custom architecture rather than a stock LLaMA, GPT-2, Mistral, or similar built-in Transformers architecture. It is packaged for Hugging Face through custom model code and supports loading with AutoModelForCausalLM.

Because Nafie-473M uses custom architecture code, it must be loaded with trust_remote_code=True.


Architecture Summary

Property Value
Model name Nafie-473M
Repository nafie-ai/nafie-473M
Architecture Custom decoder-only causal LM
Primary language Turkish
Layers 36
Hidden size 1024
Attention heads 4
Context length 1024 tokens
Tokenizer Custom BPE tokenizer
License Apache-2.0
Framework PyTorch + Hugging Face Transformers

The architecture includes RMSNorm, RoPE-style rotary positional embeddings, SwiGLU-style feed-forward blocks, and tied input/output embeddings.


CETVEL Benchmark

Nafie-473M was evaluated on CETVEL, a Turkish benchmark suite covering multiple task families. The table below reports task-aligned results on MCQA, NLI, QA, and TC.

Model MCQA NLI QA TC
Nafie-473M 44.22 34.03 11.00 37.95
Kumru-7B 57.64 37.42 16.30 63.39
Llama-3.3-70B-Instruct 60.70 37.10 23.97 63.73
Kumru-2B 39.69 37.97 6.50 47.57
Trendyol/Llama-3-Trendyol-LLM-8b-chat-v2.0 53.28 37.29 0.17 54.06
Trendyol/Trendyol-LLM-7b-chat-v4.1.0 54.94 35.71 0.34 52.12
google/gemma-3-27b-it 55.40 36.73 10.56 53.65
google/gemma-3-12b-it 52.66 34.93 10.26 54.38
Qwen/Qwen2-72B-Instruct 61.27 35.59 0.83 60.47
CohereLabs/aya-expanse-32b 52.47 35.93 0.67 50.67
CohereLabs/aya-expanse-8b 44.09 37.12 0.19 50.03
google/gemma-3-4b-it 42.33 31.11 8.22 46.15
ytu-ce-cosmos/Turkish-Gemma-9b-v0.1 51.85 32.68 0.11 46.97
meta-llama/Llama-3.2-11B-Vision-Instruct 45.66 37.49 4.37 47.88
meta-llama/Llama-3.1-8B-Instruct 45.77 38.99 3.30 46.51
google/gemma-2-9b-it 48.20 35.76 0.46 45.38
ytu-ce-cosmos/turkish-gpt2-large-750m-instruct-v0.1 35.20 37.60 0.28 52.77
Qwen/Qwen2-7B-Instruct 49.66 35.33 1.53 52.52
meta-llama/Llama-3.2-3B-Instruct 37.00 33.25 7.52 39.00

Quickstart

Install the required packages:

pip install -U "transformers[torch]" huggingface_hub safetensors

Load and run the model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo_id = "nafie-ai/nafie-473M"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
)

prompt = "<s>Türkiye'nin başkenti neresidir?</s>"
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    temperature=0.3,
    top_k=3,
    top_p=0.95,
    repetition_penalty=1.2,
)
generated_ids = outputs[0][inputs["input_ids"].shape[-1]:]
print(tokenizer.decode(generated_ids, skip_special_tokens=True).strip())

Pipeline Usage

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="nafie-ai/nafie-473M",
    trust_remote_code=True,
    device_map="auto",
)

generator(
    "<s>Türkçe dil modelleri ne işe yarar?</s>",
    max_new_tokens=128,
    do_sample=True,
    temperature=0.3,
    top_k=3,
    top_p=0.95,
    repetition_penalty=1.2,
)

Recommended Generation Settings

For interactive Turkish generation, the following settings are a useful starting point:

{
    "max_new_tokens": 700,
    "do_sample": True,
    "temperature": 0.3,
    "top_k": 3,
    "top_p": 0.95,
    "repetition_penalty": 1.2,
}

Related SFT Dataset

Nafie-473M was developed together with a Turkish supervised fine-tuning dataset containing prompt-response pairs.

Related dataset:

nafie-ai/nafie-sft-v1

The dataset is intended for Turkish SFT, instruction-following, and prompt-response style training experiments.


License

Nafie-473M is released under the Apache License 2.0.

See the LICENSE file for the full license text.


Acknowledgements

The numerical calculations reported in this paper were fully performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).


Citation

If you use Nafie-473M, please cite the model repository:

Nafie-473M: A Turkish-focused decoder-only causal language model.
https://huggingface.co/nafie-ai/nafie-473M

If you use the related SFT dataset, please also cite:

Nafie SFT Dataset.
https://huggingface.co/datasets/nafie-ai/nafie-sft-v1
Downloads last month
112
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support