--- language: - es license: cc-by-4.0 library_name: span-marker tags: - span-marker - token-classification - ner - named-entity-recognition - generated_from_span_marker_trainer datasets: - conll2002 metrics: - precision - recall - f1 widget: - text: George Washington estuvo en Washington. pipeline_tag: token-classification base_model: PlanTL-GOB-ES/roberta-base-bne model-index: - name: SpanMarker with PlanTL-GOB-ES/roberta-base-bne on conll2002 results: - task: type: token-classification name: Named Entity Recognition dataset: name: conll2002 type: conll2002 split: eval metrics: - type: f1 value: 0.871172868582195 name: F1 - type: precision value: 0.888328530259366 name: Precision - type: recall value: 0.8546672828096118 name: Recall --- # SpanMarker with PlanTL-GOB-ES/roberta-base-bne on conll2002 This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [conll2002](https://huggingface.co/datasets/conll2002) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) as the underlying encoder. ## Model Details ### Model Description - **Model Type:** SpanMarker - **Encoder:** [PlanTL-GOB-ES/roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) - **Maximum Sequence Length:** 256 tokens - **Maximum Entity Length:** 8 words - **Training Dataset:** [conll2002](https://huggingface.co/datasets/conll2002) - **Languages:** es - **License:** cc-by-4.0 ### Model Sources - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER) - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf) ### Model Labels | Label | Examples | |:------|:------------------------------------------------------------------| | LOC | "Australia", "Victoria", "Melbourne" | | MISC | "Ley", "Ciudad", "CrimeNet" | | ORG | "Commonwealth", "EFE", "Tribunal Supremo" | | PER | "Abogado General del Estado", "Daryl Williams", "Abogado General" | ## Uses ### Direct Use for Inference ```python from span_marker import SpanMarkerModel # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("alvarobartt/span-marker-roberta-base-bne-conll-2002-es") # Run inference entities = model.predict("George Washington estuvo en Washington.") ``` ## Training Details ### Training Set Metrics | Training set | Min | Median | Max | |:----------------------|:----|:--------|:-----| | Sentence length | 1 | 31.8052 | 1238 | | Entities per sentence | 0 | 2.2586 | 160 | ### Training Hyperparameters - learning_rate: 5e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 2 ### Training Results | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy | |:------:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:| | 0.1188 | 100 | 0.0704 | 0.0 | 0.0 | 0.0 | 0.8608 | | 0.2375 | 200 | 0.0279 | 0.8765 | 0.4034 | 0.5525 | 0.9025 | | 0.3563 | 300 | 0.0158 | 0.8381 | 0.7211 | 0.7752 | 0.9524 | | 0.4751 | 400 | 0.0134 | 0.8525 | 0.7463 | 0.7959 | 0.9576 | | 0.5938 | 500 | 0.0130 | 0.8844 | 0.7549 | 0.8145 | 0.9560 | | 0.7126 | 600 | 0.0119 | 0.8480 | 0.8006 | 0.8236 | 0.9650 | | 0.8314 | 700 | 0.0098 | 0.8794 | 0.8408 | 0.8597 | 0.9695 | | 0.9501 | 800 | 0.0091 | 0.8842 | 0.8360 | 0.8594 | 0.9722 | | 1.0689 | 900 | 0.0093 | 0.8976 | 0.8387 | 0.8672 | 0.9698 | | 1.1876 | 1000 | 0.0094 | 0.8880 | 0.8517 | 0.8694 | 0.9739 | | 1.3064 | 1100 | 0.0086 | 0.8920 | 0.8530 | 0.8721 | 0.9737 | | 1.4252 | 1200 | 0.0092 | 0.8896 | 0.8452 | 0.8668 | 0.9728 | | 1.5439 | 1300 | 0.0094 | 0.8765 | 0.8313 | 0.8533 | 0.9720 | | 1.6627 | 1400 | 0.0089 | 0.8805 | 0.8445 | 0.8621 | 0.9720 | | 1.7815 | 1500 | 0.0088 | 0.8834 | 0.8581 | 0.8706 | 0.9747 | | 1.9002 | 1600 | 0.0088 | 0.8883 | 0.8547 | 0.8712 | 0.9747 | ### Framework Versions - Python: 3.10.12 - SpanMarker: 1.3.1.dev - Transformers: 4.33.2 - PyTorch: 2.0.1+cu118 - Datasets: 2.14.5 - Tokenizers: 0.13.3 ## Citation ### BibTeX ``` @software{Aarsen_SpanMarker, author = {Aarsen, Tom}, license = {Apache-2.0}, title = {{SpanMarker for Named Entity Recognition}}, url = {https://github.com/tomaarsen/SpanMarkerNER} } ```