File size: 4,579 Bytes
4c40d3d 3a14cab 4c40d3d 3a14cab 4c40d3d 3a14cab 44c4f4c 3a14cab 44c4f4c 3a14cab 44c4f4c 3a14cab 44c4f4c 3a14cab fef2183 4c40d3d 44c4f4c 4c40d3d 00a7371 fef2183 00a7371 fef2183 00a7371 fef2183 00a7371 fef2183 4c40d3d fef2183 4c40d3d fef2183 4c40d3d 44c4f4c 4c40d3d fef2183 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
license: apache-2.0
tags:
- generated_from_trainer
base_model: bert-base-uncased
datasets:
- conll2003
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: NER_Pittsburgh_TAA
results:
- task:
type: token-classification
name: Token Classification
dataset:
name: conll2003
type: conll2003
config: conll2003
split: validation
args: conll2003
metrics:
- type: precision
value: 0.9429236395877203
name: Precision
- type: recall
value: 0.9517843159190066
name: Recall
- type: f1
value: 0.9473332591025497
name: F1
- type: accuracy
value: 0.9867030994328562
name: Accuracy
language:
- en
- uk
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# NER_Pittsburgh_TAA
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the conll2003 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0860
- Precision: 0.9429
- Recall: 0.9518
- F1: 0.9473
- Accuracy: 0.9867
## Model description
## Ukr
Модель була створена як практичне завдання з машиного навчання, це за fine-tuning BERT модель для задачі Named Entity Recognition.
Датасет який був використан це conll2003, стандат для навчання моделей під задачу Named Entity Recognition, або ще визначення складових мови в реченні.
Дізнатися як працює модель маєте змогу або через інтерфейс, який надає huggingface, або ж через код
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("CineAI/NER_Pittsburgh_TAA")
model = AutoModelForTokenClassification.from_pretrained("CineAI/NER_Pittsburgh_TAA")
Якщо цікавить чому модель має таку назву, перше це для чого вона для NER, друга складова це назва крутої пісні Pittsburgh третя і остання складова
це гурт який пісню створив це The Amity Affliction
## En
The model was created as a practical machine learning task, it is a fine-tuning BERT model for the Named Entity Recognition task.
The dataset used is conll2003, a standard for training models for the Named Entity Recognition task, or for identifying the components of speech in a sentence.
You can find out how the model works either through the interface provided by huggingface or through the code
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("CineAI/NER_Pittsburgh_TAA")
model = AutoModelForTokenClassification.from_pretrained("CineAI/NER_Pittsburgh_TAA")
If you are wondering why the model has such a name, the first is why it is for NER, the second component is the name of a cool song Pittsburgh, the third and last component
is the band that created the song - The Amity Affliction
## Intended uses & limitations
Everyone can use this model, it is completely free and distributed under the Apache 2.0 licence.
## Training and evaluation data
Training and assessment data are the same - conll2003
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
### Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log | 1.0 | 439 | 0.0863 | 0.9437 | 0.9444 | 0.9440 | 0.9861 |
| 0.0024 | 2.0 | 878 | 0.0995 | 0.9394 | 0.9442 | 0.9418 | 0.9852 |
| 0.0021 | 3.0 | 1317 | 0.0904 | 0.9355 | 0.9463 | 0.9409 | 0.9856 |
| 0.0012 | 4.0 | 1756 | 0.0835 | 0.9427 | 0.9514 | 0.9471 | 0.9867 |
| 0.0009 | 5.0 | 2195 | 0.0860 | 0.9429 | 0.9518 | 0.9473 | 0.9867 |
### Framework versions
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1 |