Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ pipeline_tag: token-classification
|
|
| 14 |
|
| 15 |
# NedoTurkishTokenizer
|
| 16 |
|
| 17 |
-
**Turkish morphological tokenizer — TR-MMLU world record
|
| 18 |
|
| 19 |
NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text using morphological rules. Unlike BPE-based tokenizers, it produces meaningful morphological units (roots and suffixes) aligned with Turkish grammar, powered by [Zemberek NLP](https://github.com/ahmetaa/zemberek-nlp).
|
| 20 |
|
|
@@ -25,8 +25,8 @@ NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text
|
|
| 25 |
| **Developer** | [Ethosoft](https://huggingface.co/Ethosoft) |
|
| 26 |
| **Language** | Turkish (`tr`) |
|
| 27 |
| **License** | MIT |
|
| 28 |
-
| **Benchmark** | TR-MMLU **
|
| 29 |
-
| **Morphological engine** |
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -38,15 +38,7 @@ NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text
|
|
| 38 |
pip install git+https://huggingface.co/Ethosoft/NedoTurkishTokenizer
|
| 39 |
```
|
| 40 |
|
| 41 |
-
|
| 42 |
-
> If you get a Java error, install it first:
|
| 43 |
-
>
|
| 44 |
-
> | OS | Command |
|
| 45 |
-
> |---|---|
|
| 46 |
-
> | Ubuntu / Debian | `sudo apt install default-jre` |
|
| 47 |
-
> | Fedora / RHEL | `sudo dnf install java-latest-openjdk` |
|
| 48 |
-
> | macOS | `brew install openjdk` |
|
| 49 |
-
> | Windows | `winget install Microsoft.OpenJDK.21` |
|
| 50 |
|
| 51 |
---
|
| 52 |
|
|
|
|
| 14 |
|
| 15 |
# NedoTurkishTokenizer
|
| 16 |
|
| 17 |
+
**Turkish morphological tokenizer — TR-MMLU world record 95.45%**
|
| 18 |
|
| 19 |
NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text using morphological rules. Unlike BPE-based tokenizers, it produces meaningful morphological units (roots and suffixes) aligned with Turkish grammar, powered by [Zemberek NLP](https://github.com/ahmetaa/zemberek-nlp).
|
| 20 |
|
|
|
|
| 25 |
| **Developer** | [Ethosoft](https://huggingface.co/Ethosoft) |
|
| 26 |
| **Language** | Turkish (`tr`) |
|
| 27 |
| **License** | MIT |
|
| 28 |
+
| **Benchmark** | TR-MMLU **95.45%** (world record) |
|
| 29 |
+
| **Morphological engine** | zemberek-python |
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
|
|
| 38 |
pip install git+https://huggingface.co/Ethosoft/NedoTurkishTokenizer
|
| 39 |
```
|
| 40 |
|
| 41 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
---
|
| 44 |
|