🧬 miRNA-BioBERT: Fine-Tuned BioBERT for miRNA Sentence Classification

Fine-tuned BioBERT model for classifying miRNA-related sentences in biomedical research papers.

📌 Overview

miRNA-BioBERT is a fine-tuned version of BioBERT, trained specifically for classifying sentences as miRNA-related (relevant) or not (irrelevant). The model is useful for automating literature reviews, extracting relevant sentences, and identifying key insights in genomic research.

✔ Base Model: dmis-lab/biobert-base-cased-v1.1
✔ Fine-tuning Method: LoRA (Low-Rank Adaptation)
✔ Dataset: Curated biomedical text corpus containing labeled miRNA-relevant and non-relevant sentences
✔ Task: Binary classification (1 = functional, 0 = non-functional)
✔ Trained on: RTX A6000 GPU (5 epochs, batch size 32, learning rate 2e-5)

🚀 How to Use the Model

1️⃣ Install Dependencies

pip install transformers torch

from transformers import AutoModelForSequenceClassification, AutoTokenizer  
import torch  

# Load the model and tokenizer  
model_name = "debjit20504/miRNA-biobert"  
tokenizer = AutoTokenizer.from_pretrained(model_name)  
model = AutoModelForSequenceClassification.from_pretrained(model_name)  

# Move model to GPU or MPS (for Mac)  
device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cuda" if torch.cuda.is_available() else "cpu")  
model.to(device)  
model.eval()

def classify_text(text):  
    inputs = tokenizer(text, return_tensors="pt").to(device)  
    with torch.no_grad():  
        output = model(**inputs)  
        label = torch.argmax(output.logits, dim=1).item()  
    return "functional" if label == 1 else "Non-functional"  

# Example Test  
sample_text = "miRNA translation is regulated by miRNAs."  
print(f"Classification: {classify_text(sample_text)}")

📊 Training Details

Dataset: Biomedical text dataset with 429,785 relevant sentences and 87,966 irrelevant sentences.
Fine-Tuning Method: LoRA (Low-Rank Adaptation) for efficient training.
Training Hardware: NVIDIA RTX A6000 GPU.
Training Settings:
- Batch size: 32
- Learning rate: 2e-5
- Optimizer: AdamW
- Warmup steps: 1000
- Epochs: 5
- Mixed precision (fp16): ✅ Enabled for efficiency.

📖 Model Applications

✅ Biomedical NLP – Extracting meaningful information from biomedical literature.
✅ miRNA Research – Identifying sentences discussing miRNA mechanisms.
✅ Automated Literature Review – Filtering relevant studies efficiently.
✅ Genomics & Bioinformatics – Enhancing data retrieval from scientific texts.

📬 Contact

For any questions or collaborations, reach out via:

📧 Email: debjit.pramanik@postgrad.manchester.ac.uk
🔗 LinkedIn: https://www.linkedin.com/in/debjit-pramanik-88a837171/