YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Logion: Machine Learning for Greek Philology

The most advanced Ancient Greek BERT model trained to date! Read the paper on arxiv by Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, and Barbara Graziosi.

We train a WordPiece tokenizer (with a vocab size of 50,000) on a corpus of over 70 million words of premodern Greek. Using this tokenizer and the same corpus, we train a BERT model.

Further information on this project and code for error detection can be found on GitHub.

We're adding more models trained with cleaner data and different tokenizations - keep an eye out!

How to use

Requirements:

pip install transformers

Load the model and tokenizer directly from the HuggingFace Model Hub:

from transformers import BertTokenizer, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained("cabrooks/LOGION-50k_wordpiece")
model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-50k_wordpiece")  

Cite

If you use this model in your research, please cite the paper:

@misc{logion-base,
      title={Logion: Machine Learning for Greek Philology}, 
      author={Cowen-Breen, C. and Brooks, C. and Haubold, J. and Graziosi, B.},
      year={2023},
      eprint={2305.01099},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
17
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.