Model Card: bert-base-cased-biological-ner

Model Details

Model Name: bert-base-cased-biomedical-ner
Model Architecture: BERT (Bidirectional Encoder Representations from Transformers)
Pre-trained Model: bert-base-cased
Fine-tuned on: SourceData Dataset

Model Description

The bert-base-cased-biomedical-ner is a fine-tuned variant of the BERT (Bidirectional Encoder Representations from Transformers) model, designed specifically for the task of Named Entity Recognition (NER) in the biomedical domain. The model has been fine-tuned on the SourceData Dataset, which is a substantial and comprehensive biomedical corpus for machine learning and AI in the publishing context.

Named Entity Recognition is a crucial task in natural language processing, particularly in the biomedical field, where identifying and classifying entities like genes, proteins, diseases, and more is essential for various applications, including information retrieval, knowledge extraction, and data mining.

Intended Use

The bert-base-cased-biological-ner model is intended for NER tasks within the biomedical domain. It can be used for a range of applications, including but not limited to:

Identifying and extracting biomedical entities (e.g., genes, proteins, diseases) from unstructured text.
Enhancing information retrieval systems for scientific literature.
Supporting knowledge extraction and data mining from biomedical literature.
Facilitating the creation of structured biomedical databases.

Labels

Label	Description
SMALL_MOLECULE	Small molecules
GENEPROD	Gene products (genes and proteins)
SUBCELLULAR	Subcellular components
CELL_LINE	Cell lines
CELL_TYPE	Cell types
TISSUE	Tissues and organs
ORGANISM	Species
DISEASE	Diseases
EXP_ASSAY	Experimental assays
Source of label information: EMBO/SourceData Dataset

Usage

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
import pandas as pd

tokenizer = AutoTokenizer.from_pretrained("Kushtrim/bert-base-cased-biomedical-ner")
model = AutoModelForTokenClassification.from_pretrained("Kushtrim/bert-base-cased-biomedical-ner")

ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy='first')

text = "Add your text here"

results = ner(text)

pd.DataFrame.from_records(results)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Framework versions

Transformers 4.35.0
Pytorch 2.1.0+cu118
Datasets 2.14.6
Tokenizers 0.14.1

Kushtrim
/

bert-base-cased-biomedical-ner

You need to agree to share your contact information to access this model

Model Card: bert-base-cased-biological-ner

Model Details

Model Description

Intended Use

Labels

Usage

Training procedure

Training hyperparameters

Framework versions

Model tree for Kushtrim/bert-base-cased-biomedical-ner

Dataset used to train Kushtrim/bert-base-cased-biomedical-ner

Collections including Kushtrim/bert-base-cased-biomedical-ner

NER

BERT Models

Evaluation results