𧬠miRNA-BioBERT: Fine-Tuned BioBERT for miRNA Sentence Classification
Fine-tuned BioBERT model for classifying miRNA-related sentences in biomedical research papers.
π Overview
miRNA-BioBERT is a fine-tuned version of BioBERT, trained specifically for classifying sentences as miRNA-related (relevant) or not (irrelevant). The model is useful for automating literature reviews, extracting relevant sentences, and identifying key insights in genomic research.
β Base Model: dmis-lab/biobert-base-cased-v1.1
β Fine-tuning Method: LoRA (Low-Rank Adaptation)
β Dataset: Curated biomedical text corpus containing labeled miRNA-relevant and non-relevant sentences
β Task: Binary classification (1 = functional, 0 = non-functional)
β Trained on: RTX A6000 GPU (5 epochs, batch size 32, learning rate 2e-5)
π How to Use the Model
1οΈβ£ Install Dependencies
pip install transformers torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the model and tokenizer
model_name = "debjit20504/miRNA-biobert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Move model to GPU or MPS (for Mac)
device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
def classify_text(text):
inputs = tokenizer(text, return_tensors="pt").to(device)
with torch.no_grad():
output = model(**inputs)
label = torch.argmax(output.logits, dim=1).item()
return "functional" if label == 1 else "Non-functional"
# Example Test
sample_text = "miRNA translation is regulated by miRNAs."
print(f"Classification: {classify_text(sample_text)}")
π Training Details
- Dataset: Biomedical text dataset with 429,785 relevant sentences and 87,966 irrelevant sentences.
- Fine-Tuning Method: LoRA (Low-Rank Adaptation) for efficient training.
- Training Hardware: NVIDIA RTX A6000 GPU.
- Training Settings:
- Batch size: 32
- Learning rate: 2e-5
- Optimizer: AdamW
- Warmup steps: 1000
- Epochs: 5
- Mixed precision (fp16): β Enabled for efficiency.
π Model Applications
β
Biomedical NLP β Extracting meaningful information from biomedical literature.
β
miRNA Research β Identifying sentences discussing miRNA mechanisms.
β
Automated Literature Review β Filtering relevant studies efficiently.
β
Genomics & Bioinformatics β Enhancing data retrieval from scientific texts.
π¬ Contact
For any questions or collaborations, reach out via:
π§ Email: debjit.pramanik@postgrad.manchester.ac.uk
π LinkedIn: https://www.linkedin.com/in/debjit-pramanik-88a837171/
- Downloads last month
- 9