Edit model card

MANX Colab Notebook

This is a ByT5-small model fine-tuned for early Middle English lemmatization. This is a PoC. The model has been fed series of 11-grams extracted from eLAEME corpus and prefixed with "Lemmatize: ". It is not intended to serve as general lemmatizer for all sorts of Middle English texts because eLAEME employes bespoke transcription rules that diverge from your regular transcript rules.

The manx package that you can use the model with can be found here: https://github.com/mdm-code/manx. The package will give a more general look at the data used to fine-tune the model. It lets you download corpus files, parse them and get them ready for fine-tuning the base model checkpoint. It has links to Colab notebook and ready-made API that lets you feed texts to have them lemmatized.

Make sure to reference this Huggingface repository (https://huggingface.co/mdm-code/me-lemmatize-byt5-small) and the Github repository (https://github.com/mdm-code/manx) for manx whenever you use this model for your own research. The model and package are published under the GPL-3 license.

Downloads last month
33
Safetensors
Model size
300M params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.