AmelieSchreiber's picture
Update README.md
5b925e6
|
raw
history blame
No virus
1.66 kB
metadata
license: mit
language:
  - en
library_name: transformers
tags:
  - esm
  - esm2
  - protein language model
  - biology

ESM-2 (esm2_t6_8M_UR50D)

This is a fine-tuned version of ESM-2 for sequence classification that categorizes protein sequences into two classes, either "cystolic" or "membrane".

To use try running:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Initialize the tokenizer and model
model_path_directory = "AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-localization"
tokenizer = AutoTokenizer.from_pretrained(model_path_directory)
model = AutoModelForSequenceClassification.from_pretrained(model_path_directory)

# Define a function to predict the category of a protein sequence
def predict_category(sequence):
    # Tokenize the sequence and convert it to tensor format
    inputs = tokenizer(sequence, return_tensors="pt", truncation=True, max_length=512, padding="max_length")

    # Make prediction
    with torch.no_grad():
        logits = model(**inputs).logits

    # Determine the category with the highest score
    predicted_class = torch.argmax(logits, dim=1).item()

    # Return the category: 0 for cytosolic, 1 for membrane
    return "cytosolic" if predicted_class == 0 else "membrane"

# Example sequence
new_protein_sequence = "MTQRAGAAMLPSALLLLCVPGCLTVSGPSTVMGAVGESLSVQCRYEEKYKTFNKYWCRQPCLPIWHEMVETGGSEGVVRSDQVIITDHPGDLTFTVTLENLTADDAGKYRCGIATILQEDGLSGFLPDPFFQVQVLVSSASSTENSVKTPASPTRPSQCQGSLPSSTCFLLLPLLKVPLLLSILGAILWVNRPWRTPWTES"

# Predict the category
category = predict_category(new_protein_sequence)
print(f"The predicted category for the sequence is: {category}")