🧬 miRNA-BioBERT: Fine-Tuned BioBERT for miRNA Sentence Classification

Fine-tuned BioBERT model for classifying miRNA-related sentences in biomedical research papers.


πŸ“Œ Overview

miRNA-BioBERT is a fine-tuned version of BioBERT, trained specifically for classifying sentences as miRNA-related (relevant) or not (irrelevant). The model is useful for automating literature reviews, extracting relevant sentences, and identifying key insights in genomic research.

βœ” Base Model: dmis-lab/biobert-base-cased-v1.1
βœ” Fine-tuning Method: LoRA (Low-Rank Adaptation)
βœ” Dataset: Curated biomedical text corpus containing labeled miRNA-relevant and non-relevant sentences
βœ” Task: Binary classification (1 = functional, 0 = non-functional)
βœ” Trained on: RTX A6000 GPU (5 epochs, batch size 32, learning rate 2e-5)

πŸš€ How to Use the Model

1️⃣ Install Dependencies

pip install transformers torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer  
import torch  

# Load the model and tokenizer  
model_name = "debjit20504/miRNA-biobert"  
tokenizer = AutoTokenizer.from_pretrained(model_name)  
model = AutoModelForSequenceClassification.from_pretrained(model_name)  

# Move model to GPU or MPS (for Mac)  
device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cuda" if torch.cuda.is_available() else "cpu")  
model.to(device)  
model.eval()

def classify_text(text):  
    inputs = tokenizer(text, return_tensors="pt").to(device)  
    with torch.no_grad():  
        output = model(**inputs)  
        label = torch.argmax(output.logits, dim=1).item()  
    return "functional" if label == 1 else "Non-functional"  

# Example Test  
sample_text = "miRNA translation is regulated by miRNAs."  
print(f"Classification: {classify_text(sample_text)}")  

πŸ“Š Training Details

  • Dataset: Biomedical text dataset with 429,785 relevant sentences and 87,966 irrelevant sentences.
  • Fine-Tuning Method: LoRA (Low-Rank Adaptation) for efficient training.
  • Training Hardware: NVIDIA RTX A6000 GPU.
  • Training Settings:
    • Batch size: 32
    • Learning rate: 2e-5
    • Optimizer: AdamW
    • Warmup steps: 1000
    • Epochs: 5
    • Mixed precision (fp16): βœ… Enabled for efficiency.

πŸ“– Model Applications

βœ… Biomedical NLP – Extracting meaningful information from biomedical literature.
βœ… miRNA Research – Identifying sentences discussing miRNA mechanisms.
βœ… Automated Literature Review – Filtering relevant studies efficiently.
βœ… Genomics & Bioinformatics – Enhancing data retrieval from scientific texts.


πŸ“¬ Contact

For any questions or collaborations, reach out via:

πŸ“§ Email: debjit.pramanik@postgrad.manchester.ac.uk
πŸ”— LinkedIn: https://www.linkedin.com/in/debjit-pramanik-88a837171/

Downloads last month
9
Safetensors
Model size
108M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.