Feature Extraction
sentence-transformers
ONNX
Safetensors
Portuguese
bert
legal
portuguese
brazilian
licitacao
procurement
text-embeddings-inference
Instructions to use SamuelMauli/parity-embedding-juridico-br-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use SamuelMauli/parity-embedding-juridico-br-v4 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("SamuelMauli/parity-embedding-juridico-br-v4") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
parity-embedding-juridico-br-v1
Embedding model fine-tuned for the Brazilian legal/procurement domain
(licitações, Lei 14.133/21, jurisprudência TCU, captação de capital
sustentável). Built on top of paraphrase-multilingual-MiniLM-L12-v2
(384 dim, 33M params).
Training
- Base model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- Dataset: Triplets mined from Parity's curated knowledge base (TCU acórdãos + súmulas, Lei 14.133/21 articles, IN SEGES, decretos).
- Loss:
MultipleNegativesRankingLoss(in-batch contrastive). - Strategy: self-positives (tese vs textoChave of same acórdão) + category-positives (acórdãos in same legal category) + hard negatives (top vector-similar acórdãos from different categories).
Usage
Sentence-Transformers (Python)
from sentence_transformers import SentenceTransformer
m = SentenceTransformer("SamuelMauli/parity-embedding-juridico-br-v1")
v = m.encode("Acórdão 244/2021 limita atestado quantitativo a 50%")
@xenova/transformers (JavaScript / Node / Browser)
import { pipeline } from "@xenova/transformers";
const pipe = await pipeline(
"feature-extraction",
"SamuelMauli/parity-embedding-juridico-br-v1"
);
const out = await pipe("texto jurídico", { pooling: "mean", normalize: true });
Caveats
- Trained on a small (~130 triplets), domain-narrow dataset.
- Excellent for retrieval among TCU acórdãos and Brazilian procurement literature; do NOT use for general-purpose embedding tasks.
- Output dim: 384 (preserves compatibility with
vector(384)in pgvector).
Citation
Maintained by Doublethree / Parity (samuel.mauli@gmail.com).
- Downloads last month
- 15