π§ GLiClass Gender Classifier β DeBERTaV3 Uni-Encoder (3-Class)
This model is designed for text classification in clinical narratives, specifically for determining a patient's sex or gender. It was fine-tuned using a uni-encoder architecture based on microsoft/deberta-v3-small
, and outputs one of three labels:
male
female
sex undetermined
π§ͺ Task
This is a multi-class text classification task over clinical free-text. The model predicts the gender of a patient from discharge summaries, case descriptions, or medical notes.
β οΈ It is strongly recommended to keep the labels and the input text in the same language (e.g., both in Spanish or both in English) to ensure optimal model performance. Mixing languages may reduce accuracy.
𧩠Model Architecture
- Base:
microsoft/deberta-v3-small
- Architecture:
DebertaV2ForSequenceClassification
- Fine-tuned with a uni-encoder setup
- 3 output labels
π Input Format
Each input sample must be a JSON object like this:
{
"text": "Paciente de 63 aΓ±os que referΓa dΓ©ficit de agudeza visual (AV)...",
"all_labels": ["male", "female", "sex undetermined"],
"true_labels": ["sex undetermined"]
}
## Usage example
import json
from transformers import AutoTokenizer
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
import torch
device = 0 if torch.cuda.is_available() else -1
model_path = "BSC-NLP4BIA/GLiClass-gender-classifier"
classification_type = "single-label" # or "multilabel"
test_path = "path/to/your/test_data.json"
print(f"π Loading model from {model_path}...")
model = GLiClassModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.to(device)
pipeline = ZeroShotClassificationPipeline(
model=model,
tokenizer=tokenizer,
classification_type=classification_type,
device=device
)
with open(test_path, 'r') as f:
test_data = json.load(f)
# π Automatically infer candidate labels from the dataset
all_labels = set()
for sample in test_data:
all_labels.update(sample["true_labels"])
candidate_labels = sorted(all_labels)
print(f"π§Ύ Candidate labels inferred: {candidate_labels}")
results = []
for sample in test_data:
true_labels = sample["true_labels"]
output = pipeline(sample["text"], candidate_labels)
top_results = output[0]
predicted_labels = [max(top_results, key=lambda x: x["score"])["label"]]
score_dict = {d["label"]: d["score"] for d in top_results}
entry = {
"text": sample["text"],
"true_labels": true_labels,
"predicted_labels": predicted_labels
}
# Add scores for each candidate label
for label in candidate_labels:
entry[f"score_{label}"] = score_dict.get(label, 0.0)
results.append(entry)
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support