You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ModernBERT-base-biomedical-ner

Model Details

Model Description

The ModernBERT-base-biomedical-ner is a fine-tuned variant of the ModernBERT (Bidirectional Encoder Representations from Transformers) model, designed specifically for the task of Named Entity Recognition (NER) in the biomedical domain. The model has been fine-tuned on the SourceData Dataset, which is a substantial and comprehensive biomedical corpus for machine learning and AI in the publishing context.

Named Entity Recognition is a crucial task in natural language processing, particularly in the biomedical field, where identifying and classifying entities like genes, proteins, diseases, and more is essential for various applications, including information retrieval, knowledge extraction, and data mining.

Intended Use

The ModernBERT-base-biomedical-ner model is intended for NER tasks within the biomedical domain. It can be used for a range of applications, including but not limited to:

  • Identifying and extracting biomedical entities (e.g., genes, proteins, diseases) from unstructured text.
  • Enhancing information retrieval systems for scientific literature.
  • Supporting knowledge extraction and data mining from biomedical literature.
  • Facilitating the creation of structured biomedical databases.

Labels

Label Description
SMALL_MOLECULE Small molecules
GENEPROD Gene products (genes and proteins)
SUBCELLULAR Subcellular components
CELL_LINE Cell lines
CELL_TYPE Cell types
TISSUE Tissues and organs
ORGANISM Species
DISEASE Diseases
EXP_ASSAY Experimental assays

Source of label information: EMBO/SourceData Dataset

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
import pandas as pd

tokenizer = AutoTokenizer.from_pretrained("Kushtrim/ModernBERT-base-biomedical-ner")
model = AutoModelForTokenClassification.from_pretrained("Kushtrim/ModernBERT-base-biomedical-ner")

ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy='first')

text = """In a recent study, researchers investigated the effect of aspirin on gene expression in tumor necrosis factor alpha signaling pathways. The compound was observed to localize within the mitochondrial matrix of T-helper cells, which are crucial for adaptive immunity. Tissue samples from the pulmonary epithelium of Mus musculus were analyzed using RNA sequencing to quantify transcriptomic changes. The results showed a notable decrease in markers associated with rheumatoid arthritis progression. These effects were validated in the HeLa cells, confirming the role of aspirin in modulating inflammatory gene networks."""

results = ner(text)
pd.DataFrame.from_records(results)
Downloads last month
9
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kushtrim/ModernBERT-base-biomedical-ner

Finetuned
(500)
this model

Dataset used to train Kushtrim/ModernBERT-base-biomedical-ner

Collection including Kushtrim/ModernBERT-base-biomedical-ner