AMP GenPept Binary Classifier (ESM-2 650M + LoRA)
Fine-tuned ESM-2 650M with LoRA for binary antimicrobial peptide (AMP) classification.
Performance
- F1: 0.883 (88.3%)
- Accuracy: 0.868 (86.8%)
- Benchmark: GenPept-Curated-2025 (11K sequences, 80/20 split)
- Training: 5 epochs, A6000 48GB, ~40 min
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
import torch
model = AutoModelForSequenceClassification.from_pretrained("facebook/esm2_t33_650M_UR50D", num_labels=1)
model = PeftModel.from_pretrained(model, "null-phnix/amp-genpept-esm2-650m-lora")
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
def predict_amp(sequence: str) -> float:
"""Return AMP probability for a peptide sequence."""
inputs = tokenizer(sequence, return_tensors="pt", truncation=True, padding="max_length", max_length=200)
with torch.no_grad(): logits = model(**inputs).logits
return torch.sigmoid(logits).item()
print(predict_amp("GLFDVIKKVAGALGSLVK"))
Architecture
- Base: ESM-2 650M (33 transformer layers, 1280 hidden dim)
- Adapter: LoRA r=16, alpha=32, target_modules=["query","value"]
- Head: Single sigmoid output for binary classification
Data
Trained on GenPept-Curated-2025, a balanced, leakage-free AMP benchmark.
Links
Model tree for null-phnix/amp-genpept-esm2-650m-lora
Base model
facebook/esm2_t33_650M_UR50D