--- language: en license: cc-by-sa-4.0 library_name: span-marker tags: - span-marker - token-classification - ner - named-entity-recognition - generated_from_span_marker_trainer metrics: - precision - recall - f1 widget: - text: Altitude measurements based on near - IR imaging in H and Hcont filters showed that the deeper BS2 clouds were located near the methane condensation level ( ≈1.2bars ) , while BS1 was generally ∼500 mb above that level ( at lower pressures ) . - text: However , our model predicts different performance for large enough memory - access latency and validates the intuition that the dynamic programming algorithm performs better on these machines . - text: We established a P fertilizer need map based on integrating results from the two systems . - text: Here , we have addressed this limitation for the endodermal lineage by developing a defined culture system to expand and differentiate human foregut stem cells ( hFSCs ) derived from hPSCs . hFSCs can self - renew while maintaining their capacity to differentiate into pancreatic and hepatic cells . - text: The accumulated percentage gain from selection amounted to 51%/1 % lower Striga infestation ( measured by area under Striga number progress curve , ASNPC ) , 46%/62 % lower downy mildew incidence , and 49%/31 % higher panicle yield of the C5 - FS compared to the mean of the genepool parents at Sadoré / Cinzana , respectively . pipeline_tag: token-classification base_model: allenai/specter2_base model-index: - name: SpanMarker with allenai/specter2_base on my-data results: - task: type: token-classification name: Named Entity Recognition dataset: name: my-data type: unknown split: test metrics: - type: f1 value: 0.6906354515050167 name: F1 - type: precision value: 0.7108433734939759 name: Precision - type: recall value: 0.6715447154471544 name: Recall --- # SpanMarker with allenai/specter2_base on my-data This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition. This SpanMarker model uses [allenai/specter2_base](https://huggingface.co/allenai/specter2_base) as the underlying encoder. ## Model Details ### Model Description - **Model Type:** SpanMarker - **Encoder:** [allenai/specter2_base](https://huggingface.co/allenai/specter2_base) - **Maximum Sequence Length:** 256 tokens - **Maximum Entity Length:** 8 words - **Language:** en - **License:** cc-by-sa-4.0 ### Model Sources - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER) - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf) ### Model Labels | Label | Examples | |:---------|:--------------------------------------------------------------------------------------------------------| | Data | "Depth time - series", "defect", "an overall mitochondrial" | | Material | "cross - shore measurement locations", "the subject 's fibroblasts", "COXI , COXII and COXIII subunits" | | Method | "an approximation", "EFSA", "in vitro" | | Process | "intake", "a significant reduction of synthesis", "translation" | ## Evaluation ### Metrics | Label | Precision | Recall | F1 | |:---------|:----------|:-------|:-------| | **all** | 0.7108 | 0.6715 | 0.6906 | | Data | 0.6591 | 0.6138 | 0.6356 | | Material | 0.795 | 0.7910 | 0.7930 | | Method | 0.5 | 0.45 | 0.4737 | | Process | 0.6898 | 0.6293 | 0.6582 | ## Uses ### Direct Use for Inference ```python from span_marker import SpanMarkerModel # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("span-marker-allenai/specter2_base-me") # Run inference entities = model.predict("We established a P fertilizer need map based on integrating results from the two systems .") ``` ### Downstream Use You can finetune this model on your own dataset.
Click to expand ```python from span_marker import SpanMarkerModel, Trainer # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("span-marker-allenai/specter2_base-me") # Specify a Dataset with "tokens" and "ner_tag" columns dataset = load_dataset("conll2003") # For example CoNLL2003 # Initialize a Trainer using the pretrained model & dataset trainer = Trainer( model=model, train_dataset=dataset["train"], eval_dataset=dataset["validation"], ) trainer.train() trainer.save_model("span-marker-allenai/specter2_base-me-finetuned") ```
## Training Details ### Training Set Metrics | Training set | Min | Median | Max | |:----------------------|:----|:--------|:----| | Sentence length | 3 | 25.6049 | 106 | | Entities per sentence | 0 | 5.2439 | 22 | ### Training Hyperparameters - learning_rate: 5e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 10 ### Framework Versions - Python: 3.10.12 - SpanMarker: 1.5.0 - Transformers: 4.36.2 - PyTorch: 2.0.1+cu118 - Datasets: 2.16.1 - Tokenizers: 0.15.0 ## Citation ### BibTeX ``` @software{Aarsen_SpanMarker, author = {Aarsen, Tom}, license = {Apache-2.0}, title = {{SpanMarker for Named Entity Recognition}}, url = {https://github.com/tomaarsen/SpanMarkerNER} } ```