debjit20504
/

miRNA-biobert

Text Classification

Inference Endpoints

Model card Files Files and versions Community

miRNA-biobert / README.md

debjit20504's picture

updated readme file

57ac719 7 days ago

|

3.45 kB

	---
	tags:
	- text-classification
	- transformers
	- biobert
	- miRNA
	- biomedical
	- LoRA
	- fine-tuning
	library_name: transformers
	datasets:
	- custom-biomedical-dataset
	license: apache-2.0
	---

	# 🧬 miRNA-BioBERT: Fine-Tuned BioBERT for miRNA Sentence Classification
	Fine-tuned BioBERT model for classifying miRNA-related sentences in biomedical research papers.

	<!-- 🔗 Hugging Face Model Link: [debjit20504/miRNA-biobert](https://huggingface.co/debjit20504/miRNA-biobert) -->

	---

	## 📌 Overview
	miRNA-BioBERT is a fine-tuned version of [BioBERT](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1), trained specifically for classifying sentences as miRNA-related (relevant) or not (irrelevant). The model is useful for automating literature reviews, extracting relevant sentences, and identifying key insights in genomic research.

	✔ Base Model: `dmis-lab/biobert-base-cased-v1.1`
	✔ Fine-tuning Method: LoRA (Low-Rank Adaptation)
	✔ Dataset: Curated biomedical text corpus containing labeled miRNA-relevant and non-relevant sentences
	✔ Task: Binary classification (1 = relevant, 0 = not relevant)
	✔ Trained on: RTX A6000 GPU (5 epochs, batch size 32, learning rate 2e-5)

	---

	## 📖 Model Applications
	✅ Biomedical NLP – Extracting meaningful information from biomedical literature.
	✅ miRNA Research – Identifying sentences discussing miRNA mechanisms.
	✅ Automated Literature Review – Filtering relevant studies efficiently.
	✅ Genomics & Bioinformatics – Enhancing data retrieval from scientific texts.

	---

	## 🚀 How to Use the Model
	### 1️⃣ Install Dependencies
	```bash
	pip install transformers torch
	```
	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load the model and tokenizer
	model_name = "debjit20504/miRNA-biobert"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Move model to GPU or MPS (for Mac)
	device = torch.device("mps") if torch.backends.mps.is_available() else torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	model.eval()

	def classify_text(text):
	inputs = tokenizer(text, return_tensors="pt").to(device)
	with torch.no_grad():
	output = model(**inputs)
	label = torch.argmax(output.logits, dim=1).item()
	return "Relevant (miRNA-related)" if label == 1 else "Not Relevant"

	# Example Test
	sample_text = "miRNA translation is regulated by miRNAs."
	print(f"Classification: {classify_text(sample_text)}")
	```

	## 📊 Training Details
	- Dataset: Biomedical text dataset with 429,785 relevant sentences and 87,966 irrelevant sentences.
	- Fine-Tuning Method: LoRA (Low-Rank Adaptation) for efficient training.
	- Training Hardware: NVIDIA RTX A6000 GPU.
	- Training Settings:
	- Batch size: 32
	- Learning rate: 2e-5
	- Optimizer: AdamW
	- Warmup steps: 1000
	- Epochs: 5
	- Mixed precision (fp16): ✅ Enabled for efficiency.

	## 📬 Contact
	For any questions or collaborations, reach out via:

	📧 Email: debjit.pramanik@postgrad.manchester.ac.uk
	🔗 LinkedIn: https://www.linkedin.com/in/debjit-pramanik-88a837171/