metadata

license: llama3
language:
  - tr

CERE-LLMA-3-8b-TR

This model is an fine-tuned version of a Llama3 8b Large Language Model (LLM) for Turkish. It was trained on a high quality Turkish instruction sets created from various open-source and internal resources. Turkish Instruction dataset carefully annotated to carry out Turkish instructions in an accurate and organized manner.

Model Details

Base Model: LLMA 3 7B based LLM
Tokenizer Extension: Specifically extended for Turkish
Training Dataset: Cleaned Turkish raw data with 5 billion tokens, custom Turkish instruction sets
Training Method: Initially with DORA, followed by fine-tuning with LORA

[Open LLM Turkish Leaderboard v0.2 Evaluation Results]

Metric Value Avg. AI2 Reasoning Challenge_tr HellaSwag_tr MMLU_tr TruthfulQA_tr Winogrande _tr GSM8k_tr