This model is fine tuned with:

  • The Latin Library - 15M Token
  • Perseus Project - 15M Token

The dataset was cleaned:

  • Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
  • Use of CLTK for sentence splitting and normalisation.
  • deduplication of the corpus
  • lowercase all text
Downloads last month
8
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.