Edit model card

ESM-2 (esm2_t6_8M_UR50D)

This is a fine-tuned version of ESM-2 for sequence classification that categorizes protein sequences into two classes, either "cystolic" or "membrane".

Training and Accuracy

The model is trained using this notebook and achieved an eval accuracy of 94.83163664839468 %.

Using the Model

To use try running:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Initialize the tokenizer and model
model_path_directory = "AmelieSchreiber/esm2_t6_8M_UR50D-finetuned-localization"
tokenizer = AutoTokenizer.from_pretrained(model_path_directory)
model = AutoModelForSequenceClassification.from_pretrained(model_path_directory)

# Define a function to predict the category of a protein sequence
def predict_category(sequence):
    # Tokenize the sequence and convert it to tensor format
    inputs = tokenizer(sequence, return_tensors="pt", truncation=True, max_length=512, padding="max_length")

    # Make prediction
    with torch.no_grad():
        logits = model(**inputs).logits

    # Determine the category with the highest score
    predicted_class = torch.argmax(logits, dim=1).item()

    # Return the category: 0 for cytosolic, 1 for membrane
    return "cytosolic" if predicted_class == 0 else "membrane"

# Example sequence

# Predict the category
category = predict_category(new_protein_sequence)
print(f"The predicted category for the sequence is: {category}")
Downloads last month
Model size
7.84M params
Tensor type