nmstech commited on
Commit
b719e3c
·
verified ·
1 Parent(s): 2064cba

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -12
README.md CHANGED
@@ -14,7 +14,7 @@ pipeline_tag: token-classification
14
 
15
  # NedoTurkishTokenizer
16
 
17
- **Turkish morphological tokenizer — TR-MMLU world record 92.64%**
18
 
19
  NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text using morphological rules. Unlike BPE-based tokenizers, it produces meaningful morphological units (roots and suffixes) aligned with Turkish grammar, powered by [Zemberek NLP](https://github.com/ahmetaa/zemberek-nlp).
20
 
@@ -25,8 +25,8 @@ NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text
25
  | **Developer** | [Ethosoft](https://huggingface.co/Ethosoft) |
26
  | **Language** | Turkish (`tr`) |
27
  | **License** | MIT |
28
- | **Benchmark** | TR-MMLU **92.64%** (world record) |
29
- | **Morphological engine** | Zemberek NLP (bundled) |
30
 
31
  ---
32
 
@@ -38,15 +38,7 @@ NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text
38
  pip install git+https://huggingface.co/Ethosoft/NedoTurkishTokenizer
39
  ```
40
 
41
- > **Java is required** for Zemberek morphological analysis.
42
- > If you get a Java error, install it first:
43
- >
44
- > | OS | Command |
45
- > |---|---|
46
- > | Ubuntu / Debian | `sudo apt install default-jre` |
47
- > | Fedora / RHEL | `sudo dnf install java-latest-openjdk` |
48
- > | macOS | `brew install openjdk` |
49
- > | Windows | `winget install Microsoft.OpenJDK.21` |
50
 
51
  ---
52
 
 
14
 
15
  # NedoTurkishTokenizer
16
 
17
+ **Turkish morphological tokenizer — TR-MMLU world record 95.45%**
18
 
19
  NedoTurkishTokenizer performs linguistically-aware tokenization of Turkish text using morphological rules. Unlike BPE-based tokenizers, it produces meaningful morphological units (roots and suffixes) aligned with Turkish grammar, powered by [Zemberek NLP](https://github.com/ahmetaa/zemberek-nlp).
20
 
 
25
  | **Developer** | [Ethosoft](https://huggingface.co/Ethosoft) |
26
  | **Language** | Turkish (`tr`) |
27
  | **License** | MIT |
28
+ | **Benchmark** | TR-MMLU **95.45%** (world record) |
29
+ | **Morphological engine** | zemberek-python |
30
 
31
  ---
32
 
 
38
  pip install git+https://huggingface.co/Ethosoft/NedoTurkishTokenizer
39
  ```
40
 
41
+
 
 
 
 
 
 
 
 
42
 
43
  ---
44