parity-embedding-juridico-br-v1

Embedding model fine-tuned for the Brazilian legal/procurement domain (licitações, Lei 14.133/21, jurisprudência TCU, captação de capital sustentável). Built on top of paraphrase-multilingual-MiniLM-L12-v2 (384 dim, 33M params).

Training

  • Base model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
  • Dataset: Triplets mined from Parity's curated knowledge base (TCU acórdãos + súmulas, Lei 14.133/21 articles, IN SEGES, decretos).
  • Loss: MultipleNegativesRankingLoss (in-batch contrastive).
  • Strategy: self-positives (tese vs textoChave of same acórdão) + category-positives (acórdãos in same legal category) + hard negatives (top vector-similar acórdãos from different categories).

Usage

Sentence-Transformers (Python)

from sentence_transformers import SentenceTransformer
m = SentenceTransformer("SamuelMauli/parity-embedding-juridico-br-v1")
v = m.encode("Acórdão 244/2021 limita atestado quantitativo a 50%")

@xenova/transformers (JavaScript / Node / Browser)

import { pipeline } from "@xenova/transformers";
const pipe = await pipeline(
  "feature-extraction",
  "SamuelMauli/parity-embedding-juridico-br-v1"
);
const out = await pipe("texto jurídico", { pooling: "mean", normalize: true });

Caveats

  • Trained on a small (~130 triplets), domain-narrow dataset.
  • Excellent for retrieval among TCU acórdãos and Brazilian procurement literature; do NOT use for general-purpose embedding tasks.
  • Output dim: 384 (preserves compatibility with vector(384) in pgvector).

Citation

Maintained by Doublethree / Parity (samuel.mauli@gmail.com).

Downloads last month
15
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamuelMauli/parity-embedding-juridico-br-v4