DisEmbed (Disease Embedding)
Collection
Embedding Model for Diseases
•
2 items
•
Updated
DisEmbed-v1 is a disease-focused embedding model designed for the medical domain, trained on a synthetic dataset comprising disease descriptions, symptoms, and Q&A pairs. It outperforms general medical models in disease-specific tasks, particularly in distinguishing similar diseases. DisEmbed excels in retrieval task and disease-context identification.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("SalmanFaroz/DisEmbed-v1")
# Run inference
sentences = [
'Chronic cough with blood-streaked sputum, severe night sweats, and unintentional weight loss.Painful breathing or chest pain, often worsened by coughing.Swelling in the neck or lymph nodes, and frequent fatigue.',
'Asthma',
'Tuberculosis'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Citation
@article{faroz2024disembed,
title={DisEmbed: Transforming Disease Understanding through Embeddings},
author={Faroz, Salman},
journal={arXiv preprint arXiv:2412.15258},
year={2024},
doi={10.48550/arXiv.2412.15258},
url={https://arxiv.org/abs/2412.15258}
}
Base model
BAAI/bge-small-en-v1.5