Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,51 @@ colorTo: gray
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
|
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
+
<div align="center">
|
| 10 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/63da3d7ae697e5898cb86854/4EnLA20pUWnvqppA5y5Q4.gif" alt="denoising_small_16_9" />
|
| 11 |
+
<h1>Diffutron: A Masked Diffusion Language Model for Turkish Language</h1>
|
| 12 |
+
</div>
|
| 13 |
|
| 14 |
+
<p align="center">
|
| 15 |
+
   | 🤗 <a href="https://huggingface.co/collections/diffutron/diffutronlm">Models</a>   |
|
| 16 |
+
   📊 <a href="https://huggingface.co/datasets/diffutron/DiffutronLM-Pretraining-Corpus">Pre-training Dataset</a>   |
|
| 17 |
+
   📄 <a href="">Paper</a>   |
|
| 18 |
+
</p>
|
| 19 |
+
|
| 20 |
+
## Overview
|
| 21 |
+
|
| 22 |
+
Diffutron is a lightweight, non-autoregressive Masked Diffusion Language Model (MDLM) specifically optimized for the Turkish language. By utilizing a discrete diffusion process, Diffutron generates text through iterative refinement, allowing for bi-directional context awareness and high parameter efficiency.
|
| 23 |
+
|
| 24 |
+
## Core Features
|
| 25 |
+
|
| 26 |
+
* **Architecture:** Discrete Masked Diffusion (MDLM) using a 307M parameter encoder backbone.
|
| 27 |
+
* **Efficiency:** Achieves competitive performance against 2B+ parameter autoregressive models on Turkish benchmarks.
|
| 28 |
+
* **Adaptation:** LoRA-based (r=256) continual pre-training on a 2M sequence Turkish corpus.
|
| 29 |
+
* **Instruction Tuning:** Progressive strategy using LlamaTurk and InstrucTurca datasets for enhanced command following.
|
| 30 |
+
|
| 31 |
+
## Benchmarks
|
| 32 |
+
|
| 33 |
+
Diffutron achieves a significant reduction in perplexity and competitive scores across the CETVEL benchmark suite:
|
| 34 |
+
|
| 35 |
+
| Benchmark | Diffutron-1st-Stage (0.3B) | Diffutron-2nd-Stage (0.3B) | TURNA (1.1B) | Kumru (2B) | Kanarya (2B) | Llama-3.2 (3B) | Trendyol (7B) | Aya-101 (13B) |
|
| 36 |
+
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 37 |
+
| **Belebele_TR** | 22.22 | 27.00 | 22.56 | 29.00 | 28.11 | **55.78** | 36.22 | 22.89 |
|
| 38 |
+
| **EXAMS_TR** | 25.95 | 27.74 | 23.66 | **30.03** | **30.03** | 26.21 | 28.50 | 22.90 |
|
| 39 |
+
| **IronyTR** | 50.67 | **52.00** | 48.33 | 51.00 | 50.00 | 50.17 | 50.00 | **52.17** |
|
| 40 |
+
| **News_Cat** | 23.20 | 32.40 | 32.80 | 26.40 | 66.80 | 64.00 | **81.20** | 20.00 |
|
| 41 |
+
| **MNLI_TR** | 33.29 | 32.81 | 34.94 | **36.42** | 33.40 | 34.76 | 35.19 | 27.90 |
|
| 42 |
+
| **STS_TR** | 17.77 | **18.78** | 14.21 | 11.75 | 12.91 | 12.91 | 15.52 | 16.97 |
|
| 43 |
+
| **XCOPA_TR** | 53.80 | 52.00 | 55.80 | 54.00 | **64.20** | 54.60 | 61.00 | 59.60 |
|
| 44 |
+
| **Average** | 32.41 | **34.68** | 33.19 | 34.09 | 40.78 | 42.63 | **43.95** | 31.78 |
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
## Citation
|
| 48 |
+
|
| 49 |
+
```bibtex
|
| 50 |
+
@article{kocabay2026diffutron,
|
| 51 |
+
title={Diffutron: A Masked Diffusion Language Model for Turkish Language},
|
| 52 |
+
author={Kocabay, Şuayp Talha and Akkuş, Talha Rüzgar},
|
| 53 |
+
journal={arXiv: [cs.CL]},
|
| 54 |
+
year={2026}
|
| 55 |
+
}
|
| 56 |
|