Upload 7 files

Browse files

Files changed (7) hide show

README.md +1 -72
adapter_config.json +22 -0
adapter_model.bin +3 -0
special_tokens_map.json +7 -0
tokenizer_config.json +5 -0
training_args.bin +3 -0
vocab.txt +33 -0

README.md CHANGED Viewed

@@ -1,80 +1,9 @@
 ---
 library_name: peft
-license: mit
-language:
-- en
-tags:
-- transformers
-- biology
-- esm
-- esm2
-- protein
-- protein language model
 ---
-# ESM-2 RNA Binding Site LoRA
-This is a Parameter Efficient Fine Tuning (PEFT) Low Rank Adaptation ([LoRA](https://huggingface.co/docs/peft/task_guides/token-classification-lora)) of
-the [esm2_t30_150M_UR50D](https://huggingface.co/facebook/esm2_t30_150M_UR50D) model for the (binary) token classification task of
-predicting RNA binding sites of proteins. The Github with the training script and conda env YAML can be
-[found here](https://github.com/Amelie-Schreiber/esm2_LoRA_binding_sites/tree/main). You can also find a version of this model
-that was fine-tuned without LoRA [here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_UR50D_rna_binding_site_predictor).
 ## Training procedure
-This is a Low Rank Adaptation (LoRA) of `esm2_t6_8M_UR50D`,
-trained on `166` protein sequences in the [RNA binding sites dataset](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites)
-using a `75/25` train/test split. It achieves an evaluation loss of `0.17312709987163544`.
 ### Framework versions
-- PEFT 0.4.0
-## Using the Model
-To use, try running:
-```python
-from transformers import AutoModelForTokenClassification, AutoTokenizer
-from peft import PeftModel
-import torch
-# Path to the saved LoRA model
-model_path = "AmelieSchreiber/esm2_t30_150M_UR50D_LoRA_RNA_binding"
-# ESM2 base model
-base_model_path = "facebook/esm2_t30_150M_UR50D"
-# Load the model
-base_model = AutoModelForTokenClassification.from_pretrained(base_model_path)
-loaded_model = PeftModel.from_pretrained(base_model, model_path)
-# Ensure the model is in evaluation mode
-loaded_model.eval()
-# Load the tokenizer
-loaded_tokenizer = AutoTokenizer.from_pretrained(base_model_path)
-# Protein sequence for inference
-protein_sequence = "MAVPETRPNHTIYINNLNEKIKKDELKKSLHAIFSRFGQILDILVSRSLKMRGQAFVIFKEVSSATNALRSMQGFPFYDKPMRIQYAKTDSDIIAKMKGT"  # Replace with your actual sequence
-# Tokenize the sequence
-inputs = loaded_tokenizer(protein_sequence, return_tensors="pt", truncation=True, max_length=1024, padding='max_length')
-# Run the model
-with torch.no_grad():
-    logits = loaded_model(**inputs).logits
-# Get predictions
-tokens = loaded_tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])  # Convert input ids back to tokens
-predictions = torch.argmax(logits, dim=2)
-# Define labels
-id2label = {
-    0: "No binding site",
-    1: "Binding site"
-}
-# Print the predicted labels for each token
-for token, prediction in zip(tokens, predictions[0].numpy()):
-    if token not in ['<pad>', '<cls>', '<eos>']:
-        print((token, id2label[prediction]))
-```

 ---
 library_name: peft
 ---
 ## Training procedure
 ### Framework versions
+- PEFT 0.4.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "auto_mapping": null,
+  "base_model_name_or_path": "facebook/esm2_t30_150M_UR50D",
+  "bias": "all",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "revision": null,
+  "target_modules": [
+    "query",
+    "key",
+    "value"
+  ],
+  "task_type": "TOKEN_CLS"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:54594596755c5a6be53f40096bdaa80d79320473cc84ec25b0cc41db622aa2cd
+size 15750833

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "<cls>",
+  "eos_token": "<eos>",
+  "mask_token": "<mask>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "clean_up_tokenization_spaces": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "tokenizer_class": "EsmTokenizer"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8fffc4fc4da24eef90002cdc8ff78928507824695c61fdce3f26df01aff254ec
+size 4091

vocab.txt ADDED Viewed

	@@ -0,0 +1,33 @@

+<cls>
+<pad>
+<eos>
+<unk>
+L
+A
+G
+V
+S
+E
+R
+T
+I
+D
+P
+K
+Q
+N
+F
+Y
+M
+H
+W
+C
+X
+B
+U
+Z
+O
+.
+-
+<null_1>
+<mask>