metadata

tags:
  - spacy
  - token-classification
  - ner
language:
  - en
license: mit
model-index:
  - name: en_core_web_sm_job
    results:
      - task:
          name: NER
          type: token-classification
        metrics:
          - name: NER Precision
            type: precision
            value: 0.7516398746
          - name: NER Recall
            type: recall
            value: 0.6069711538
          - name: NER F Score
            type: f_score
            value: 0.6742971968
      - task:
          name: TAG
          type: token-classification
        metrics:
          - name: TAG (XPOS) Accuracy
            type: accuracy
            value: 0.7334810915
library_name: spacy
pipeline_tag: text-classification

Custom spaCy NER Model for "Profession," "Facility," and "Experience" Entities

Overview

This spaCy-based Named Entity Recognition (NER) model has been custom-trained to recognize and classify entities related to "profession," "facility," and "experience." It is designed to enhance your text analysis capabilities by identifying these specific entity types in unstructured text data.

Key Features

Custom-trained for high accuracy in recognizing "profession," "facility," and "experience" entities. Suitable for various professional info streams tasks, such as information extraction, content categorization, and more. Currently Focus on the job seekers fields, can be easily integrated into your existing spaCy-based NLP pipelines.

Usage

Installation

You can install the custom spaCy NER model using pip:

git lfs install
git clone https://huggingface.co/LPDoctor/en_core_web_sm_job_related

Example Usage

Here's how you can use the model for entity recognition in Python:


import spacy

# Load the custom spaCy NER model
nlp = spacy.load("en_core_web_sm_job")

# Process your text
text = "HR Specialist needed at Google, Dallas, TX, with expertise in employee relations and a minimum of 4 years of HR experience."
doc = nlp(text)

# Extract named entities
for ent in doc.ents:
    print(f"Entity: {ent.text}, Type: {ent.label_}")

Entity Types

The model recognizes the following entity types:

PROFESSION: Represents professions or job titles.
FACILITY: Denotes facilities, buildings, or locations.
EXPERIENCE: Identifies mentions of work experience, durations, or qualifications.

Feature	Description
Name	`en_core_web_sm_job`
Version	`3.7.0`
spaCy	`>=3.7.0,<3.8.0`
Default Pipeline	`tok2vec`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner`
Components	`tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	OntoNotes 5 (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston) ClearNLP Constituent-to-Dependency Conversion (Emory University) WordNet 3.0 (Princeton University)
License	`MIT`

Label Scheme

View label scheme (116 labels for 3 components)

Component	Labels
`tagger`	`$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, `_SP`, ````
`parser`	`ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp`
`ner`	`CARDINAL`, `DATE`, `EVENT`, `EXPERIENCE`, `FAC`, `FACILITY`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `PROFESSION`, `QUANTITY`, `TIME`, `WORK_OF_ART`

Accuracy

Type	Score
`TOKEN_P`	78.59
`TOKEN_R`	63.58
`TOKEN_F`	70.57
`CUSTOM_TAG_ACC`	71.98