--- tags: - spacy - token-classification language: - en model-index: - name: en_chemner results: - task: name: NER type: token-classification metrics: - name: NER Precision type: precision value: 0.9906542056 - name: NER Recall type: recall value: 0.9636363636 - name: NER F Score type: f_score value: 0.9769585253 widget: - text: >- Cinammaldehyde is a fragrant compound found in cinammon. Icosanoic acid, is a saturated fatty acid with a 20-carbon chain. Triptane is commonly used as an anti-knock additive in aviation fuels. Benzophenone is a widely used building block in organic chemistry, being the parent diarylketone. Geraniol is a monoterpenoid and an alcohol. It is the primary component of citronella oil and is a primary component of rose oil, palmarosa oil. license: apache-2.0 --- # en_chemner: A spaCy Model for Chemical NER ## Model Description The `en_chemner` model is a specialized Named Entity Recognition (NER) tool designed for the field of chemistry. Built using the spaCy framework, it identifies and classifies chemical entities within English-language texts. ### Key Features - **High Precision and Recall**: With a precision of 99.07% and a recall of 96.36%, the model offers highly accurate entity recognition, minimizing both false positives and false negatives. - **Rich Label Scheme**: The model can identify a variety of chemical entities such as alcohols, aldehydes, alkanes, and more, making it versatile for different chemical analysis tasks. - **Optimized for spaCy**: Integrated seamlessly with spaCy (>=3.6.1,<3.7.0), allowing for easy incorporation into existing spaCy pipelines and applications. - **Extensive Vector Library**: Comes with over 514,000 unique vectors, each with 300 dimensions, providing a rich foundation for understanding and classifying chemical entities. ### Use Cases The `en_chemner` model is ideal for: - **Chemical Literature Analysis**: Automatically extracting chemical entities from research papers, patents, and textbooks. - **Data Annotation**: Assisting in the annotation of chemical databases or creating datasets for further machine learning tasks. - **Educational Purposes**: Helping students in chemistry-related fields to identify and understand various chemical compounds and their classifications. - | Feature | Description | | --- | --- | | **Name** | `en_chemner` | | **Version** | `1.0.0` | | **spaCy** | `>=3.6.1,<3.7.0` | | **Default Pipeline** | `tok2vec`, `ner` | | **Components** | `tok2vec`, `ner` | | **Vectors** | 514157 keys, 514157 unique vectors (300 dimensions) | | **Sources** | n/a | | **License** | n/a | | **Author** | [n/a]() | ### Label Scheme
View label scheme (7 labels for 1 components) | Component | Labels | | --- | --- | | **`ner`** | `ALCOHOL`, `ALDEHYDE`, `ALKANE`, `ALKENE`, `ALKYNE`, `C_ACID`, `KETONE` |
### Accuracy | Type | Score | | --- | --- | | `ENTS_F` | 97.70 | | `ENTS_P` | 99.07 | | `ENTS_R` | 96.36 | | `TOK2VEC_LOSS` | 151.95 | | `NER_LOSS` | 259.22 |