🧠 GLiClass Gender Classifier β€” DeBERTaV3 Uni-Encoder (3-Class)

This model is designed for text classification in clinical narratives, specifically for determining a patient's sex or gender. It was fine-tuned using a uni-encoder architecture based on microsoft/deberta-v3-small, and outputs one of three labels:

  • male
  • female
  • sex undetermined

πŸ§ͺ Task

This is a multi-class text classification task over clinical free-text. The model predicts the gender of a patient from discharge summaries, case descriptions, or medical notes.

⚠️ It is strongly recommended to keep the labels and the input text in the same language (e.g., both in Spanish or both in English) to ensure optimal model performance. Mixing languages may reduce accuracy.


🧩 Model Architecture

  • Base: microsoft/deberta-v3-small
  • Architecture: DebertaV2ForSequenceClassification
  • Fine-tuned with a uni-encoder setup
  • 3 output labels

πŸ” Input Format

Each input sample must be a JSON object like this:

{
  "text": "Paciente de 63 aΓ±os que referΓ­a dΓ©ficit de agudeza visual (AV)...",
  "all_labels": ["male", "female", "sex undetermined"],
  "true_labels": ["sex undetermined"]
}

## Usage example
import json
from transformers import AutoTokenizer
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
import torch

device = 0 if torch.cuda.is_available() else -1
model_path = "BSC-NLP4BIA/GLiClass-gender-classifier"
classification_type = "single-label"  # or "multilabel"
test_path = "path/to/your/test_data.json"

print(f"πŸ”„ Loading model from {model_path}...")
model = GLiClassModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.to(device)

pipeline = ZeroShotClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    classification_type=classification_type,
    device=device
)

with open(test_path, 'r') as f:
    test_data = json.load(f)

# πŸ” Automatically infer candidate labels from the dataset
all_labels = set()
for sample in test_data:
    all_labels.update(sample["true_labels"])
candidate_labels = sorted(all_labels)

print(f"🧾 Candidate labels inferred: {candidate_labels}")

results = []

for sample in test_data:
    true_labels = sample["true_labels"]
    output = pipeline(sample["text"], candidate_labels)
    top_results = output[0]

    predicted_labels = [max(top_results, key=lambda x: x["score"])["label"]]
    score_dict = {d["label"]: d["score"] for d in top_results}
    
    entry = {
        "text": sample["text"],
        "true_labels": true_labels,
        "predicted_labels": predicted_labels
    }
    # Add scores for each candidate label
    for label in candidate_labels:
        entry[f"score_{label}"] = score_dict.get(label, 0.0)

    results.append(entry)
Downloads last month
6
Safetensors
Model size
142M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support