metadata

library_name: transformers
license: mit
base_model: bert-base-cased
tags:
  - generated_from_trainer
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: searchqueryner-be
    results: []
datasets:
  - putazon/searchqueryner-100k
language:
  - en
  - es
pipeline_tag: token-classification

bert-finetuned-ner

This model is a fine-tuned version of bert-base-cased on the SearchQueryNER-100k dataset. It achieves the following results on the evaluation set:

Loss: 0.0005
Precision: 0.9999
Recall: 0.9999
F1: 0.9999
Accuracy: 0.9999

Model description

This model has been fine-tuned for Named Entity Recognition (NER) tasks on search queries, making it particularly effective for understanding user intent and extracting structured entities from short texts. The training leveraged the SearchQueryNER-100k dataset, which contains 13 entity types.

Intended uses & limitations

Intended uses:

Extracting named entities such as locations, professions, and attributes from user search queries.
Optimizing search engines by improving query understanding.

Limitations:

The model may not generalize well to domains outside of search queries.

Training and evaluation data

The training and evaluation data were sourced from the SearchQueryNER-100k dataset. The dataset includes tokenized search queries annotated with 13 entity types, divided into training, validation, and test sets:

Training set: 102,931 examples
Validation set: 20,420 examples
Test set: 20,301 examples

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: ADAMW_TORCH with betas=(0.9,0.999), epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
0.0011	1.0	12867	0.0009	0.9999	0.9999	0.9999	0.9999
0.002	2.0	25734	0.0004	0.9999	0.9999	0.9999	0.9999
0.0005	3.0	38601	0.0005	0.9999	0.9999	0.9999	0.9999

Framework versions

Transformers 4.48.1
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0