FremyCompany
/

BioLORD-2023

Sentence Similarity

sentence-transformers

feature-extraction

Inference Endpoints

Model card Files Files and versions Community

FremyCompany commited on Nov 27, 2023

Commit

4f4afec

•

1 Parent(s): 1343694

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ This model was trained using BioLORD, a new pre-training strategy for producing
 State-of-the-art methodologies operate by maximizing the similarity in representation of names referring to the same concept, and preventing collapse through contrastive learning. However, because biomedical names are not always self-explanatory, it sometimes results in non-semantic representations.
-BioLORD overcomes this issue by grounding its concept representations using definitions, as well as short descriptions derived from a multi-relational knowledge graph consisting of biomedical ontologies. Thanks to this grounding, our model produces more semantic concept representations that match more closely the hierarchical structure of ontologies. BioLORD-2023 establishes a new state of the art for text similarity on both clinical sentences (MedSTS) and biomedical concepts (MayoSRS).
 This model is based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and was further finetuned on the [BioLORD-Dataset](https://huggingface.co/datasets/FremyCompany/BioLORD-Dataset) and LLM-generated definitions from the [Automatic Glossary of Clinical Terminology (AGCT)](https://huggingface.co/datasets/FremyCompany/AGCT-Dataset).

 State-of-the-art methodologies operate by maximizing the similarity in representation of names referring to the same concept, and preventing collapse through contrastive learning. However, because biomedical names are not always self-explanatory, it sometimes results in non-semantic representations.
+BioLORD overcomes this issue by grounding its concept representations using definitions, as well as short descriptions derived from a multi-relational knowledge graph consisting of biomedical ontologies. Thanks to this grounding, our model produces more semantic concept representations that match more closely the hierarchical structure of ontologies. BioLORD-2023 establishes a new state of the art for text similarity on both clinical sentences (MedSTS) and biomedical concepts (EHR-Rel-B).
 This model is based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and was further finetuned on the [BioLORD-Dataset](https://huggingface.co/datasets/FremyCompany/BioLORD-Dataset) and LLM-generated definitions from the [Automatic Glossary of Clinical Terminology (AGCT)](https://huggingface.co/datasets/FremyCompany/AGCT-Dataset).