Custom spaCy NER Model for "Profession," "Facility," and "Experience" Entities
Overview
This spaCy-based Named Entity Recognition (NER) model has been custom-trained to recognize and classify entities related to "profession," "facility," and "experience." It is designed to enhance your text analysis capabilities by identifying these specific entity types in unstructured text data.
Key Features
Custom-trained for high accuracy in recognizing "profession," "facility," and "experience" entities. Suitable for various NLP tasks, such as information extraction, content categorization, and more. Can be easily integrated into your existing spaCy-based NLP pipelines.
Usage
Installation
You can install the custom spaCy NER model using pip:
pip install https://huggingface.co/DaFull/en_core_web_sm_job/resolve/main/en_core_web_sm_job-any-py3-none-any.whl
Example Usage
Here's how you can use the model for entity recognition in Python:
import spacy
# Load the custom spaCy NER model
nlp = spacy.load("en_core_web_sm_job ")
# Process your text
text = "HR Specialist needed at Google, Dallas, TX, with expertise in employee relations and a minimum of 4 years of HR experience."
doc = nlp(text)
# Extract named entities
for ent in doc.ents:
print(f"Entity: {ent.text}, Type: {ent.label_}")
Entity Types
The model recognizes the following entity types:
- PROFESSION: Represents professions or job titles.
- FACILITY: Denotes facilities, buildings, or locations.
- EXPERIENCE: Identifies mentions of work experience, durations, or qualifications.
Feature | Description |
---|---|
Name | en_core_web_sm_job |
Version | 3.6.0 |
spaCy | >=3.6.0,<3.7.0 |
Default Pipeline | tok2vec , tagger , parser , attribute_ruler , lemmatizer , ner |
Components | tok2vec , tagger , parser , senter , attribute_ruler , lemmatizer , ner |
Vectors | 514157 keys, 514157 unique vectors (300 dimensions) |
License | MIT |
Label Scheme
View label scheme (116 labels for 3 components)
Component | Labels |
---|---|
tagger |
$ , '' , , , -LRB- , -RRB- , . , : , ADD , AFX , CC , CD , DT , EX , FW , HYPH , IN , JJ , JJR , JJS , LS , MD , NFP , NN , NNP , NNPS , NNS , PDT , POS , PRP , PRP$ , RB , RBR , RBS , RP , SYM , TO , UH , VB , VBD , VBG , VBN , VBP , VBZ , WDT , WP , WP$ , WRB , XX , _SP , ```` |
parser |
ROOT , acl , acomp , advcl , advmod , agent , amod , appos , attr , aux , auxpass , case , cc , ccomp , compound , conj , csubj , csubjpass , dative , dep , det , dobj , expl , intj , mark , meta , neg , nmod , npadvmod , nsubj , nsubjpass , nummod , oprd , parataxis , pcomp , pobj , poss , preconj , predet , prep , prt , punct , quantmod , relcl , xcomp |
ner |
CARDINAL , DATE , EVENT , EXPERIENCE , FAC , FACILITY , GPE , LANGUAGE , LAW , LOC , MONEY , NORP , ORDINAL , ORG , PERCENT , PERSON , PRODUCT , PROFESSION , QUANTITY , TIME , WORK_OF_ART |
Accuracy
Type | Score |
---|---|
TOKEN_P |
75.57 |
TOKEN_R |
60.58 |
TOKEN_F |
67.57 |
CUSTOM_TAG_ACC |
73.35 |
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Evaluation results
- NER Precisionself-reported0.752
- NER Recallself-reported0.607
- NER F Scoreself-reported0.674
- TAG (XPOS) Accuracyself-reported0.733