Update README.md
Browse files
README.md
CHANGED
|
@@ -74,7 +74,7 @@ OpenBioNER outperforms all competing models, achieving the **highest average per
|
|
| 74 |
| UniNER | 7B | 25.1 | 60.4 | 48.1 | 46.2 | 47.9 | **68.0** | 50.2 | **53.4** | 49.9 |
|
| 75 |
| GLiNER_large-v1 | 459M | 33.3 | **61.9** | **57.1** | 47.9 | 43.1 | 66.4 | 51.9 | **53.4** | 51.9 |
|
| 76 |
| OpenBioNER *(Ours)* | 110M | 35.2 | 58.5 | **57.1** | **49.1** | **48.0** | 60.4 | **63.9** | 50.9 | **52.9** |
|
| 77 |
-
| OpenBioNER *(Ours)* - Zshot | 110M | 34.8 | 57.8 | 56.8 | 49.5 | 47.1 | 60.1 | 64.6 | 52.
|
| 78 |
|
| 79 |
> ⚠️ **Disclaimer**: Please note that running evaluations using the `zshot` library may lead to slightly different results on certain benchmarks compared to those reported in the paper (above). This discrepancy is due to differences in token alignment: `zshot` uses spaCy's character-based span matching, while our experiments use token-level alignment as handled by BERT-based NER pipelines. These differences can affect how entity spans are matched and evaluated, particularly in cases with subword tokenization or punctuation.
|
| 80 |
|
|
@@ -99,7 +99,6 @@ This is the description used as NEG class (e.g. not an entity) for all the datas
|
|
| 99 |
| :------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
| 100 |
| DISEASE | A disease is a medical condition that disrupts normal bodily functions or structures, affecting various organs or systems, and leading to symptoms like muscle weakness, fatigue, stiffness, or cognitive impairment. Diseases can impact muscles, the nervous system, heart, eyes, and more, and may be chronic or acute, such as diabetes, cardiovascular or neurological disorders, and cancer-related conditions like lymphoblastic leukemia or lymphoma. |
|
| 101 |
|
| 102 |
-
---
|
| 103 |
|
| 104 |
### AnatEM
|
| 105 |
|
|
@@ -107,7 +106,6 @@ This is the description used as NEG class (e.g. not an entity) for all the datas
|
|
| 107 |
| :------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 108 |
| ANATOMY | The anatomy refers to biological components at various scales, including cells, tissues, and organs. These entities can be identified by proper nouns referring to cell types (e.g., HeLa cells, neurospheres, NSCLC, SCC), body parts (e.g., serum, blood) or biological substances (e.g., vegetables, meats, cow milk) or tumors. |
|
| 109 |
|
| 110 |
-
---
|
| 111 |
|
| 112 |
### BC4CHEMD
|
| 113 |
|
|
@@ -115,7 +113,6 @@ This is the description used as NEG class (e.g. not an entity) for all the datas
|
|
| 115 |
| :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 116 |
| CHEMICAL | Chemicals are substances that are composed of one or more elements, typically consisting of atoms bonded together by chemical bonds. They can be naturally occurring, such as vitamins or sterols, or synthesized, like alkylcarbazoles or tetrachlorodibenzo-p-dioxins (TCDD). Chemicals can also be modified or combined to form new compounds, such as esters or polymers. |
|
| 117 |
|
| 118 |
-
---
|
| 119 |
|
| 120 |
### BC2GM
|
| 121 |
|
|
@@ -123,7 +120,6 @@ This is the description used as NEG class (e.g. not an entity) for all the datas
|
|
| 123 |
| :--- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 124 |
| GENE | A gene is a unit of heredity that carries information from one generation to the next and is composed of DNA sequences that encode the instructions for the development, growth, and function of an organism. It can be a segment of DNA that is passed from one generation to the next and is responsible for the transmission of traits from parents to offspring. A gene is often represented using a three-letter code (e.g., trios, ABL, DNA-PK). |
|
| 125 |
|
| 126 |
-
---
|
| 127 |
|
| 128 |
### BC5CDR
|
| 129 |
|
|
@@ -132,7 +128,6 @@ This is the description used as NEG class (e.g. not an entity) for all the datas
|
|
| 132 |
| CHEMICAL | Chemicals are substances that are composed of atoms, either bonded together in a molecule or as a mixture of different substances. This includes medications (e.g., nitroarginine methyl ester, nifedipine, prednisolone, methyldopa), compounds (e.g., potassium, calcium, ammonium), and other substances that can have various effects on the body. |
|
| 133 |
| DISEASE | Diseases are any medical condition that affects the normal functioning of the body, resulting in symptoms, discomfort, or potentially life-threatening complications. This includes chronic and acute disorders, conditions affecting specific bodily systems, cancer-related conditions, and complications arising from medical treatments or external factors. |
|
| 134 |
|
| 135 |
-
---
|
| 136 |
|
| 137 |
### JNLPBA
|
| 138 |
|
|
@@ -144,7 +139,6 @@ This is the description used as NEG class (e.g. not an entity) for all the datas
|
|
| 144 |
| CELL\_LINE | A cell line is a population of cells derived from a single cell, cultured in vitro or in vivo. It can be normal or transformed, with genetic changes like mutations. Cell lines, such as B-cells or HeLa cells, are used in research to study cellular processes, model diseases, and develop treatments. |
|
| 145 |
| RNA | RNA is a type of nucleic acid that plays a crucial role in the transmission of genetic information from DNA to proteins. It is a single-stranded molecule composed of nucleotides, and its primary function is to carry genetic information from the nucleus to the ribosomes, where it is translated into proteins. |
|
| 146 |
|
| 147 |
-
---
|
| 148 |
|
| 149 |
### JNLPBA-Rare
|
| 150 |
|
|
@@ -153,22 +147,22 @@ This is the description used as NEG class (e.g. not an entity) for all the datas
|
|
| 153 |
| CELL\_LINE | A cell line is a population of cells derived from a single cell, cultured in vitro or in vivo. It can be normal or transformed, with genetic changes like mutations. Cell lines, such as B-cells or HeLa cells, are used in research to study cellular processes, model diseases, and develop treatments. |
|
| 154 |
| RNA | RNA is a type of nucleic acid that plays a crucial role in the transmission of genetic information from DNA to proteins. It is a single-stranded molecule composed of nucleotides, and its primary function is to carry genetic information from the nucleus to the ribosomes, where it is translated into proteins. |
|
| 155 |
|
| 156 |
-
---
|
| 157 |
|
| 158 |
### MedMentions-Rare
|
| 159 |
|
| 160 |
| TYPE | Description |
|
| 161 |
| :--- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 162 |
-
| NEG | In this study, we fabricated prevascularized synthetic device ports to help mitigate this limitation. Thus, the optimum range of pore size for prevascularization of these membranes was estimated to be 75 - 100 μm. A total of 51 patients were included, 16 in group I and 35 in group II.
|
| 163 |
-
| Bacterium (T007) | A bacterium refers to a type of microorganism that can exist as a single cell and may cause infections or play a role in various biological processes. |
|
| 164 |
-
| Body Substance (T031) | A body substance is any material produced by or found within the body, such as blood, serum, saliva, sweat, or gastric acid. |
|
| 165 |
-
| Food (T168) | A food refers to any substance consumed to provide nutritional support for the body. This includes snacks, meat, dairy products, grains, and edible substances like carbohydrates, proteins, and fats. |
|
| 166 |
-
| Body System (T022) | A body system consists of interconnected organs and tissues working together to carry out essential functions. Examples include the gastrointestinal tract, nervous system, hematological system, and endocrine system. |
|
| 167 |
| Professional or Occupational Group (T097) | A professional refers to individuals who share the same profession, occupation, or role within a specific field. Examples include cardiologists, psychologists, assessors, hospice staff, and volunteers. |
|
| 168 |
|
| 169 |
---
|
| 170 |
|
| 171 |
|
|
|
|
| 172 |
# 🧬 How to Write Effective Entity Type Descriptions
|
| 173 |
|
| 174 |
Entity type descriptions are crucial for improving generalization in OpenBioNER. Well-written descriptions help models disambiguate types, handle rare classes, and align with real-world usage across diverse datasets.
|
|
|
|
| 74 |
| UniNER | 7B | 25.1 | 60.4 | 48.1 | 46.2 | 47.9 | **68.0** | 50.2 | **53.4** | 49.9 |
|
| 75 |
| GLiNER_large-v1 | 459M | 33.3 | **61.9** | **57.1** | 47.9 | 43.1 | 66.4 | 51.9 | **53.4** | 51.9 |
|
| 76 |
| OpenBioNER *(Ours)* | 110M | 35.2 | 58.5 | **57.1** | **49.1** | **48.0** | 60.4 | **63.9** | 50.9 | **52.9** |
|
| 77 |
+
| OpenBioNER *(Ours)* - Zshot | 110M | 34.8 | 57.8 | 56.8 | 49.5 | 47.1 | 60.1 | 64.6 | 52.9 | 53.0 |
|
| 78 |
|
| 79 |
> ⚠️ **Disclaimer**: Please note that running evaluations using the `zshot` library may lead to slightly different results on certain benchmarks compared to those reported in the paper (above). This discrepancy is due to differences in token alignment: `zshot` uses spaCy's character-based span matching, while our experiments use token-level alignment as handled by BERT-based NER pipelines. These differences can affect how entity spans are matched and evaluated, particularly in cases with subword tokenization or punctuation.
|
| 80 |
|
|
|
|
| 99 |
| :------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
| 100 |
| DISEASE | A disease is a medical condition that disrupts normal bodily functions or structures, affecting various organs or systems, and leading to symptoms like muscle weakness, fatigue, stiffness, or cognitive impairment. Diseases can impact muscles, the nervous system, heart, eyes, and more, and may be chronic or acute, such as diabetes, cardiovascular or neurological disorders, and cancer-related conditions like lymphoblastic leukemia or lymphoma. |
|
| 101 |
|
|
|
|
| 102 |
|
| 103 |
### AnatEM
|
| 104 |
|
|
|
|
| 106 |
| :------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 107 |
| ANATOMY | The anatomy refers to biological components at various scales, including cells, tissues, and organs. These entities can be identified by proper nouns referring to cell types (e.g., HeLa cells, neurospheres, NSCLC, SCC), body parts (e.g., serum, blood) or biological substances (e.g., vegetables, meats, cow milk) or tumors. |
|
| 108 |
|
|
|
|
| 109 |
|
| 110 |
### BC4CHEMD
|
| 111 |
|
|
|
|
| 113 |
| :------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 114 |
| CHEMICAL | Chemicals are substances that are composed of one or more elements, typically consisting of atoms bonded together by chemical bonds. They can be naturally occurring, such as vitamins or sterols, or synthesized, like alkylcarbazoles or tetrachlorodibenzo-p-dioxins (TCDD). Chemicals can also be modified or combined to form new compounds, such as esters or polymers. |
|
| 115 |
|
|
|
|
| 116 |
|
| 117 |
### BC2GM
|
| 118 |
|
|
|
|
| 120 |
| :--- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 121 |
| GENE | A gene is a unit of heredity that carries information from one generation to the next and is composed of DNA sequences that encode the instructions for the development, growth, and function of an organism. It can be a segment of DNA that is passed from one generation to the next and is responsible for the transmission of traits from parents to offspring. A gene is often represented using a three-letter code (e.g., trios, ABL, DNA-PK). |
|
| 122 |
|
|
|
|
| 123 |
|
| 124 |
### BC5CDR
|
| 125 |
|
|
|
|
| 128 |
| CHEMICAL | Chemicals are substances that are composed of atoms, either bonded together in a molecule or as a mixture of different substances. This includes medications (e.g., nitroarginine methyl ester, nifedipine, prednisolone, methyldopa), compounds (e.g., potassium, calcium, ammonium), and other substances that can have various effects on the body. |
|
| 129 |
| DISEASE | Diseases are any medical condition that affects the normal functioning of the body, resulting in symptoms, discomfort, or potentially life-threatening complications. This includes chronic and acute disorders, conditions affecting specific bodily systems, cancer-related conditions, and complications arising from medical treatments or external factors. |
|
| 130 |
|
|
|
|
| 131 |
|
| 132 |
### JNLPBA
|
| 133 |
|
|
|
|
| 139 |
| CELL\_LINE | A cell line is a population of cells derived from a single cell, cultured in vitro or in vivo. It can be normal or transformed, with genetic changes like mutations. Cell lines, such as B-cells or HeLa cells, are used in research to study cellular processes, model diseases, and develop treatments. |
|
| 140 |
| RNA | RNA is a type of nucleic acid that plays a crucial role in the transmission of genetic information from DNA to proteins. It is a single-stranded molecule composed of nucleotides, and its primary function is to carry genetic information from the nucleus to the ribosomes, where it is translated into proteins. |
|
| 141 |
|
|
|
|
| 142 |
|
| 143 |
### JNLPBA-Rare
|
| 144 |
|
|
|
|
| 147 |
| CELL\_LINE | A cell line is a population of cells derived from a single cell, cultured in vitro or in vivo. It can be normal or transformed, with genetic changes like mutations. Cell lines, such as B-cells or HeLa cells, are used in research to study cellular processes, model diseases, and develop treatments. |
|
| 148 |
| RNA | RNA is a type of nucleic acid that plays a crucial role in the transmission of genetic information from DNA to proteins. It is a single-stranded molecule composed of nucleotides, and its primary function is to carry genetic information from the nucleus to the ribosomes, where it is translated into proteins. |
|
| 149 |
|
|
|
|
| 150 |
|
| 151 |
### MedMentions-Rare
|
| 152 |
|
| 153 |
| TYPE | Description |
|
| 154 |
| :--- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 155 |
+
| NEG | In this study, we fabricated prevascularized synthetic device ports to help mitigate this limitation. Thus, the optimum range of pore size for prevascularization of these membranes was estimated to be 75 - 100 μm. A total of 51 patients were included, 16 in group I and 35 in group II. |
|
| 156 |
+
| Bacterium (T007) | A bacterium refers to a type of microorganism that can exist as a single cell and may cause infections or play a role in various biological processes. Examples include species like Streptococcus pneumoniae and Streptomyces ahygroscopicus. |
|
| 157 |
+
| Body Substance (T031) | A body substance is any material produced by or found within the body, such as blood, serum, saliva, sweat, or gastric acid. Specific examples include serum cytokine levels for immune responses, blood lipids for metabolic studies, and hemolymph glucose for stress responses. |
|
| 158 |
+
| Food (T168) | A food refers to any substance consumed to provide nutritional support for the body. This includes a wide range of items such as snacks, meat, dairy products, grains like wheat, and edible substances like carbohydrates, proteins, and fats. |
|
| 159 |
+
| Body System (T022) | A body system consists of interconnected organs and tissues working together to carry out essential functions. Examples include the gastrointestinal tract for digestion, the nervous system for sensory and motor control, the hematological system for blood-related functions, and the endocrine system for hormone regulation. |
|
| 160 |
| Professional or Occupational Group (T097) | A professional refers to individuals who share the same profession, occupation, or role within a specific field. Examples include cardiologists, psychologists, assessors, hospice staff, and volunteers. |
|
| 161 |
|
| 162 |
---
|
| 163 |
|
| 164 |
|
| 165 |
+
|
| 166 |
# 🧬 How to Write Effective Entity Type Descriptions
|
| 167 |
|
| 168 |
Entity type descriptions are crucial for improving generalization in OpenBioNER. Well-written descriptions help models disambiguate types, handle rare classes, and align with real-world usage across diverse datasets.
|