StyleECU-es

StyleECU-es is a style embedding model for Spanish, obtained by fine-tuning mStyleDistance on SynthSTEL-ES, a purpose-built Spanish contrastive dataset of 51,400 triplets covering 71 stylistic dimensions.

Model Description

StyleECU-es specializes the mStyleDistance embedding space toward stylistic phenomena most relevant to Spanish, including dialectal variation (voseo/tuteo), expressive morphology, syntactic complexity, and digital style.

Training

  • Base model: StyleDistance/mstyledistance
  • Training objective: TripletLoss (contrastive learning)
  • Dataset: style-anon/SynthSTEL-ES
  • Training size: 51,400 triplets
  • Epochs: 2

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("style-anon/StyleECU-es")
embeddings = model.encode(["Your text here"])

Evaluation

Evaluated on PAN author profiling tasks (Spanish):

Task Base (mStyleDistance) StyleECU-es ฮ”
PAN 2018 โ€“ Gender prediction baseline +3 pp +3 pp
PAN 2021 โ€“ Hate speech spreaders 0.70 0.81 +11 pp

Authors

Citation

If you use this model, please cite:

Paper under review. Citation will be updated upon publication.

Downloads last month
21
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for style-anon/StyleECU-es

Finetuned
(2)
this model