tkesgin's picture
Update README.md
bbb0158
|
raw
history blame
2.33 kB
metadata
widget:
  - text: gelirken bir litre [MASK] aldım.
    example_title: ürün

turkish-tiny-bert-uncased

This is a Turkish Tiny uncased BERT model, developed to fill the gap for small-sized BERT models for Turkish. Since this model is uncased: it does not make a difference between turkish and Turkish.

⚠ Uncased use requires manual lowercase conversion

Note that due to a known issue with the tokenizer, the do_lower_case = True flag should NOT be used with the tokenizer. Instead, convert your text to lower case as follows:

text.replace("I", "ı").lower()

Be aware that this model may exhibit biased predictions as it was trained primarily on crawled data, which inherently can contain various biases.

Other relevant information can be found in the paper.

from transformers import AutoTokenizer, BertForMaskedLM
from transformers import pipeline

model = BertForMaskedLM.from_pretrained(r"turkish-tiny-bert-uncased")
# or
# model = BertForMaskedLM.from_pretrained(r"turkish-tiny-bert-uncased", from_tf = True)

tokenizer = AutoTokenizer.from_pretrained(r"turkish-tiny-bert-uncased")

unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
unmasker("gelirken bir litre [MASK] aldım.")
# [{'score': 0.202457457780838,
#   'token': 2417,
#   'token_str': 'su',
#   'sequence': 'gelirken bir litre su aldım.'},
#  {'score': 0.09290537238121033,
#   'token': 11818,
#   'token_str': 'benzin',
#   'sequence': 'gelirken bir litre benzin aldım.'},
#  {'score': 0.07785643637180328,
#   'token': 2026,
#   'token_str': '##den',
#   'sequence': 'gelirken bir litreden aldım.'},
#  {'score': 0.06889808923006058,
#   'token': 2299,
#   'token_str': '##yi',
#   'sequence': 'gelirken bir litreyi aldım.'},
#  {'score': 0.03152570128440857,
#   'token': 2647,
#   'token_str': '##ye',
#   'sequence': 'gelirken bir litreye aldım.'}]

Acknowledgments

  • Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). Thanks for providing access to the TFRC ❤️
  • Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗

License

MIT