tkesgin commited on
Commit
466a681
1 Parent(s): 5955170

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -1,3 +1,60 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # turkish-tiny-bert-uncased
2
+
3
+ This is a Turkish Tiny uncased BERT model, developed to fill the gap for small-sized BERT models for Turkish. Since this model is uncased: it does not make a difference between turkish and Turkish.
4
+
5
+ #### ⚠ Uncased use requires manual lowercase conversion
6
+
7
+ Please note that due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer, the `do_lower_case = True` flag should not be used with the tokenizer. Instead, convert your text to lower case as follows:
8
+ ```python
9
+ text.replace("I", "ı").lower()
10
+ ```
11
+
12
+ Please be aware that this model may exhibit biased predictions as it was trained primarily on crawled data, which inherently can contain various biases.
13
+
14
+ Other relevant information can be found in the [paper](https://arxiv.org/abs/2307.14134).
15
+
16
+
17
+
18
+ ```python
19
+ from transformers import AutoTokenizer, BertForMaskedLM
20
+ from transformers import pipeline
21
+
22
+ model = BertForMaskedLM.from_pretrained(r"turkish-tiny-bert-uncased")
23
+ # or
24
+ # model = BertForMaskedLM.from_pretrained(r"turkish-tiny-bert-uncased", from_tf = True)
25
+
26
+ tokenizer = AutoTokenizer.from_pretrained(r"turkish-tiny-bert-uncased")
27
+
28
+ unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
29
+ unmasker("gelirken bir litre [MASK] aldım.")
30
+ # [{'score': 0.202457457780838,
31
+ # 'token': 2417,
32
+ # 'token_str': 'su',
33
+ # 'sequence': 'gelirken bir litre su aldım.'},
34
+ # {'score': 0.09290537238121033,
35
+ # 'token': 11818,
36
+ # 'token_str': 'benzin',
37
+ # 'sequence': 'gelirken bir litre benzin aldım.'},
38
+ # {'score': 0.07785643637180328,
39
+ # 'token': 2026,
40
+ # 'token_str': '##den',
41
+ # 'sequence': 'gelirken bir litreden aldım.'},
42
+ # {'score': 0.06889808923006058,
43
+ # 'token': 2299,
44
+ # 'token_str': '##yi',
45
+ # 'sequence': 'gelirken bir litreyi aldım.'},
46
+ # {'score': 0.03152570128440857,
47
+ # 'token': 2647,
48
+ # 'token_str': '##ye',
49
+ # 'sequence': 'gelirken bir litreye aldım.'}]
50
+ ```
51
+
52
+
53
+ # Acknowledgments
54
+ - Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
55
+ - Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
56
+
57
+
58
+ # License
59
+
60
+ MIT