DisEmbed (Disease Embedding)

DisEmbed-v1 is a disease-focused embedding model designed for the medical domain, trained on a synthetic dataset comprising disease descriptions, symptoms, and Q&A pairs. It outperforms general medical models in disease-specific tasks, particularly in distinguishing similar diseases. DisEmbed excels in retrieval task and disease-context identification.

Model Details

Model Description

Dataset : DisEmbed-Symptom-Disease-v1
Paper : DisEmbed: Transforming Disease Understanding through Embeddings
Maximum Sequence Length: 512 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Language: en
License: mit

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("SalmanFaroz/DisEmbed-v1")
# Run inference
sentences = [
    'Chronic cough with blood-streaked sputum, severe night sweats, and unintentional weight loss.Painful breathing or chest pain, often worsened by coughing.Swelling in the neck or lymph nodes, and frequent fatigue.',
    'Asthma',
    'Tuberculosis'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)

Citation

@article{faroz2024disembed,
  title={DisEmbed: Transforming Disease Understanding through Embeddings},
  author={Faroz, Salman},
  journal={arXiv preprint arXiv:2412.15258},
  year={2024},
  doi={10.48550/arXiv.2412.15258},
  url={https://arxiv.org/abs/2412.15258}
}

SalmanFaroz
/

DisEmbed-v1

DisEmbed (Disease Embedding)

Model Details

Model Description

Usage

Direct Usage (Sentence Transformers)

Model tree for SalmanFaroz/DisEmbed-v1

Dataset used to train SalmanFaroz/DisEmbed-v1

Collection including SalmanFaroz/DisEmbed-v1

DisEmbed (Disease Embedding)