|
Entity Recognition (NER) Model Card |
|
Model Overview |
|
Model Name: NER LSTM Model |
|
Description: A LSTM-based model for Named Entity Recognition (NER) task. The model aims to classify words in text into their respective named entity categories such as Person, Organization, Location, etc. |
|
|
|
Intended Use |
|
Primary Use Case: Extracting named entities (e.g., names of people, organizations, locations) from text. |
|
|
|
Usage Instructions: |
|
|
|
Install the required libraries: Ensure that the required libraries, such as pandas, scikit-learn, keras, and tensorflow, are installed. |
|
Load the model and tokenizer: Use the Hugging Face Transformers library to load the model and tokenizer from the provided files. |
|
Tokenize input text: Preprocess input text and tokenize it using the loaded tokenizer. |
|
Make predictions: Feed the tokenized input through the model to obtain predictions for named entity categories. |
|
Post-process predictions: Use the LabelEncoder to map model predictions back to human-readable named entity categories. |
|
Performance and Evaluation |
|
Performance Metrics: |
|
|
|
Test Loss: The loss value achieved on the test dataset. |
|
Test Accuracy: The accuracy achieved on the test dataset. |
|
Training Accuracy: The accuracy achieved on the training dataset. |
|
Validation Accuracy: The accuracy achieved on the validation dataset. |
|
Performance Summary: |
|
|
|
The model achieved an accuracy of approximately [Test Accuracy] on the test dataset. |
|
Training and validation accuracies are provided for reference. |
|
Dataset |
|
Dataset Name: NER dataset.csv |
|
Description: The dataset contains labeled data for named entity recognition. It includes columns for 'Word' and 'POS' (Part-of-Speech) labels. |
|
|
|
Model Details |
|
Architecture: |
|
|
|
Embedding Layer: Converts input tokens into dense vectors. |
|
LSTM Layer: Processes the sequence of word embeddings. |
|
Dense Layer: Produces a probability distribution over named entity categories. |
|
Hyperparameters: |
|
|
|
Embedding Dimension: 100 |
|
LSTM Units: 128 |
|
Batch Size: 64 |
|
Max Sequence Length: 100 |
|
Optimizer: Adam |
|
Loss Function: Sparse Categorical Cross-Entropy |