File size: 2,334 Bytes
6e570a2
 
 
 
 
 
466a681
 
 
 
 
 
bbb0158
466a681
 
 
 
bbb0158
466a681
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
widget:
- text: "gelirken bir litre [MASK] aldım."
  example_title: "ürün"
---

# turkish-tiny-bert-uncased

This is a Turkish Tiny uncased BERT model, developed to fill the gap for small-sized BERT models for Turkish. Since this model is uncased: it does not make a difference between turkish and Turkish. 

#### ⚠ Uncased use requires manual lowercase conversion

Note that due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer, the `do_lower_case = True` flag should **NOT** be used with the tokenizer. Instead, convert your text to lower case as follows: 
```python
text.replace("I", "ı").lower()
```

Be aware that this model may exhibit biased predictions as it was trained primarily on crawled data, which inherently can contain various biases.

Other relevant information can be found in the [paper](https://arxiv.org/abs/2307.14134). 



```python
from transformers import AutoTokenizer, BertForMaskedLM
from transformers import pipeline

model = BertForMaskedLM.from_pretrained(r"turkish-tiny-bert-uncased")
# or
# model = BertForMaskedLM.from_pretrained(r"turkish-tiny-bert-uncased", from_tf = True)

tokenizer = AutoTokenizer.from_pretrained(r"turkish-tiny-bert-uncased")

unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
unmasker("gelirken bir litre [MASK] aldım.")
# [{'score': 0.202457457780838,
#   'token': 2417,
#   'token_str': 'su',
#   'sequence': 'gelirken bir litre su aldım.'},
#  {'score': 0.09290537238121033,
#   'token': 11818,
#   'token_str': 'benzin',
#   'sequence': 'gelirken bir litre benzin aldım.'},
#  {'score': 0.07785643637180328,
#   'token': 2026,
#   'token_str': '##den',
#   'sequence': 'gelirken bir litreden aldım.'},
#  {'score': 0.06889808923006058,
#   'token': 2299,
#   'token_str': '##yi',
#   'sequence': 'gelirken bir litreyi aldım.'},
#  {'score': 0.03152570128440857,
#   'token': 2647,
#   'token_str': '##ye',
#   'sequence': 'gelirken bir litreye aldım.'}]
```


# Acknowledgments
- Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗


# License

MIT