ytu-ce-cosmos
/

turkish-tiny-bert-uncased

Inference Endpoints

Model card Files Files and versions Community

turkish-tiny-bert-uncased / README.md

tkesgin's picture

Update README.md

bbb0158 12 months ago

|

2.33 kB

	---
	widget:
	- text: "gelirken bir litre [MASK] aldım."
	example_title: "ürün"
	---

	# turkish-tiny-bert-uncased

	This is a Turkish Tiny uncased BERT model, developed to fill the gap for small-sized BERT models for Turkish. Since this model is uncased: it does not make a difference between turkish and Turkish.

	#### ⚠ Uncased use requires manual lowercase conversion

	Note that due to a [known issue](https://github.com/huggingface/transformers/issues/6680) with the tokenizer, the `do_lower_case = True` flag should NOT be used with the tokenizer. Instead, convert your text to lower case as follows:
	```python
	text.replace("I", "ı").lower()
	```

	Be aware that this model may exhibit biased predictions as it was trained primarily on crawled data, which inherently can contain various biases.

	Other relevant information can be found in the [paper](https://arxiv.org/abs/2307.14134).



	```python
	from transformers import AutoTokenizer, BertForMaskedLM
	from transformers import pipeline

	model = BertForMaskedLM.from_pretrained(r"turkish-tiny-bert-uncased")
	# or
	# model = BertForMaskedLM.from_pretrained(r"turkish-tiny-bert-uncased", from_tf = True)

	tokenizer = AutoTokenizer.from_pretrained(r"turkish-tiny-bert-uncased")

	unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
	unmasker("gelirken bir litre [MASK] aldım.")
	# [{'score': 0.202457457780838,
	# 'token': 2417,
	# 'token_str': 'su',
	# 'sequence': 'gelirken bir litre su aldım.'},
	# {'score': 0.09290537238121033,
	# 'token': 11818,
	# 'token_str': 'benzin',
	# 'sequence': 'gelirken bir litre benzin aldım.'},
	# {'score': 0.07785643637180328,
	# 'token': 2026,
	# 'token_str': '##den',
	# 'sequence': 'gelirken bir litreden aldım.'},
	# {'score': 0.06889808923006058,
	# 'token': 2299,
	# 'token_str': '##yi',
	# 'sequence': 'gelirken bir litreyi aldım.'},
	# {'score': 0.03152570128440857,
	# 'token': 2647,
	# 'token_str': '##ye',
	# 'sequence': 'gelirken bir litreye aldım.'}]
	```


	# Acknowledgments
	- Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
	- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗


	# License

	MIT