--- tags: - spacy - token-classification - ner language: - en license: mit model-index: - name: en_core_web_sm_job results: - task: name: NER type: token-classification metrics: - name: NER Precision type: precision value: 0.7516398746 - name: NER Recall type: recall value: 0.6069711538 - name: NER F Score type: f_score value: 0.6742971968 - task: name: TAG type: token-classification metrics: - name: TAG (XPOS) Accuracy type: accuracy value: 0.7334810915 library_name: spacy pipeline_tag: text-classification --- # Custom spaCy NER Model for "Profession," "Facility," and "Experience" Entities ### Overview This spaCy-based Named Entity Recognition (NER) model has been custom-trained to recognize and classify entities related to "profession," "facility," and "experience." It is designed to enhance your text analysis capabilities by identifying these specific entity types in unstructured text data. ### Key Features Custom-trained for high accuracy in recognizing "profession," "facility," and "experience" entities. Suitable for various professional info streams tasks, such as information extraction, content categorization, and more. Currently Focus on the job seekers fields, can be easily integrated into your existing spaCy-based NLP pipelines. ### Usage #### Installation ##### You can install the custom spaCy NER model using pip: ```bash git lfs install git clone https://huggingface.co/LPDoctor/en_core_web_sm_job_related ``` #### Example Usage Here's how you can use the model for entity recognition in Python: ```python import spacy # Load the custom spaCy NER model nlp = spacy.load("en_core_web_sm_job") # Process your text text = "HR Specialist needed at Google, Dallas, TX, with expertise in employee relations and a minimum of 4 years of HR experience." doc = nlp(text) # Extract named entities for ent in doc.ents: print(f"Entity: {ent.text}, Type: {ent.label_}") ``` #### Entity Types The model recognizes the following entity types: - PROFESSION: Represents professions or job titles. - FACILITY: Denotes facilities, buildings, or locations. - EXPERIENCE: Identifies mentions of work experience, durations, or qualifications. | Feature | Description | | --- | --- | | **Name** | `en_core_web_sm_job` | | **Version** | `3.7.0` | | **spaCy** | `>=3.7.0,<3.8.0` | | **Default Pipeline** | `tok2vec`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` | | **Components** | `tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` | | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | | **Sources** | [OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)
[ClearNLP Constituent-to-Dependency Conversion](https://github.com/clir/clearnlp-guidelines/blob/master/md/components/dependency_conversion.md) (Emory University)
[WordNet 3.0](https://wordnet.princeton.edu/) (Princeton University) | | **License** | `MIT` | ### Label Scheme
View label scheme (116 labels for 3 components) | Component | Labels | | --- | --- | | **`tagger`** | `$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, `_SP`, ```` | | **`parser`** | `ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp` | | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `EXPERIENCE`, `FAC`, `FACILITY`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `PROFESSION`, `QUANTITY`, `TIME`, `WORK_OF_ART` |
### Accuracy | Type | Score | | --- | --- | | `TOKEN_P` | 78.59 | | `TOKEN_R` | 63.58 | | `TOKEN_F` | 70.57 | | `CUSTOM_TAG_ACC` | 71.98 |