wnut-distilbert-finetuned

This model is a fine-tuned version of distilbert/distilbert-base-uncased on the WNUT 2017 dataset for Named Entity Recognition (NER).

Model Description

The wnut-distilbert-finetuned model is designed for token classification tasks, specifically for Named Entity Recognition (NER). It leverages the DistilBERT architecture, which is a smaller, faster version of BERT with reduced computational requirements, while maintaining competitive performance.

Intended Uses & Limitations

Intended Uses

Named Entity Recognition (NER): Extract and classify entities such as names, locations, organizations, etc., from text.
Text Analysis: Enhance applications in information extraction, question answering, and text understanding.

How to Use

To use this model, you can load it using the Hugging Face Transformers library. Below is an example of how to perform inference using the model:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Ashaduzzaman/wnut-distilbert-finetuned")
model = AutoModelForTokenClassification.from_pretrained("Ashaduzzaman/bert-finetuned-ner")

# Create a pipeline for NER
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)

# Example inference
text = "Hugging Face Inc. is based in New York City."
entities = ner_pipeline(text)

print(entities)

Limitations

Performance on Other Domains: Performance may vary when applied to domains or data types different from the WNUT 2017 dataset.
Entity Types: The model is trained on the specific entity types present in the WNUT 2017 dataset and may not perform well on entity types not covered by the training data.
Data Sensitivity: The model may have biases or limitations based on the training data it was exposed to.

Training and Evaluation Data

Training Data

Dataset: WNUT 2017, which includes a set of texts annotated with entities relevant to the dataset.
Data Split: Training and validation splits of the WNUT 2017 dataset were used during the fine-tuning process.

Evaluation Data

Dataset: WNUT 2017 test set, used to evaluate model performance after fine-tuning.

Training Procedure

Training Hyperparameters

Learning Rate: 2e-05
Train Batch Size: 16
Eval Batch Size: 16
Seed: 42
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
Learning Rate Scheduler: Linear
Number of Epochs: 3

Training Results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.0	213	0.2751	0.5114	0.2289	0.3163	0.9385
No log	2.0	426	0.2627	0.5398	0.3327	0.4117	0.9434
0.1832	3.0	639	0.2704	0.5336	0.3383	0.4141	0.9444

Framework Versions

Transformers: 4.42.4
Pytorch: 2.3.1+cu121
Datasets: 2.21.0
Tokenizers: 0.19.1

ashaduzzaman
/

wnut-distilbert-finetuned