Patristic Latin Sentence Embeddings

This is a Latin sentence-transformers model finetuned from bowphs/LaBerta. It maps Latin sentences and paragraphs to a dense vector space and can be used for semantic correspondances detection or information retrieval (e.g. for detection of quotations and allusions).

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base Model: bowphs/LaBerta
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: cosine
  • Pooling Mode: cls
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer(transformer_task=feature-extraction, architecture=RobertaModel)
  (1): Pooling(embedding_dimension=768, pooling_mode=cls)
)

Usage

Direct Usage (Sentence Transformers)

pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Tdelaselle/PatriLaSE")
sentences = [
    "quis ergo sanat omnes languores tuos nisi qui propitius fit omnibus iniquitatibus tuis?",
    "qui propitiatur omnibus iniquitatibus tuis qui sanat omnes infirmitates tuas",
]
embeddings = model.encode(sentences)
print(embeddings.shape)

Training Details

Training Dataset

  • Size: 183227 training samples
  • Corpus Path: Patristic Latin sentences
  • Loss: MaskedDenoisingAutoEncoderLoss

Training Hyperparameters

  • epochs: 3
  • batch_size: 16
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • warmup_ratio: 0.0

Framework Versions

  • pytorch: 2.11.0+cu130
  • sentence_transformers: 5.4.1
  • transformers: 4.57.6
Downloads last month
90
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TdelaSelle/PatriLaSE

Base model

bowphs/LaBerta
Finetuned
(2)
this model

Space using TdelaSelle/PatriLaSE 1