Spaces:
Build error
Build error
Update src/about.py
Browse files- src/about.py +12 -12
src/about.py
CHANGED
@@ -35,21 +35,21 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
35 |
### PROBE runs benchmark analyses on protein representation/feature vectors of any representation learning method in order to evaluate its predictive performance on protein function related predictive tasks, and to and compare it other methods from literature.
|
36 |
|
37 |
### Aiming to evaluate how much each representation model captures different facets of functional information, we constructed and applied 4 independent benchmark tests based on:
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
|
54 |
### PROBE is part of the the study entitled [Learning functional properties of proteins with language models](https://rdcu.be/cJAKN) which is schematically summarized in the figure below:<br/>
|
55 |
|
|
|
35 |
### PROBE runs benchmark analyses on protein representation/feature vectors of any representation learning method in order to evaluate its predictive performance on protein function related predictive tasks, and to and compare it other methods from literature.
|
36 |
|
37 |
### Aiming to evaluate how much each representation model captures different facets of functional information, we constructed and applied 4 independent benchmark tests based on:
|
38 |
+
1. **Semantic Similarity Inference**:
|
39 |
+
- This benchmark evaluates how well protein representation models can infer functional similarities between proteins. Ground truth functional similarities are derived from Gene Ontology (GO) annotations.
|
40 |
+
- Different distance metrics (Cosine, Manhattan, Euclidean) are used to compute protein vector similarities, which are then correlated with the functional similarities.
|
41 |
|
42 |
+
2. **Ontology-Based Protein Function Prediction (PFP)**:
|
43 |
+
- This benchmark assesses the ability of representation models to predict ontology-based functional annotations (GO terms). The models are tested on how well they classify proteins based on molecular functions, biological processes, and cellular components.
|
44 |
+
- A linear classifier is used to ensure that the models themselves are responsible for good performance, rather than the complexity of the classifier.
|
45 |
|
46 |
+
3. **Drug Target Protein Family Classification**:
|
47 |
+
- This benchmark focuses on predicting the family of drug target proteins (enzymes, receptors, ion channels, etc.). This task tests the ability of models to learn structural features critical to these classifications.
|
48 |
+
- The study evaluates models using datasets with varying sequence similarity thresholds (random, 50%, 30%, 15%) to ensure the models can predict beyond simple sequence similarity.
|
49 |
|
50 |
+
4. **Protein–Protein Binding Affinity Estimation**:
|
51 |
+
- This benchmark evaluates models' ability to predict the change in binding affinities between proteins due to mutations. The dataset used is the **SKEMPI** dataset, which contains experimentally determined binding affinities.
|
52 |
+
- The task measures how well models can extract critical structural features important for protein-protein interactions.
|
53 |
|
54 |
### PROBE is part of the the study entitled [Learning functional properties of proteins with language models](https://rdcu.be/cJAKN) which is schematically summarized in the figure below:<br/>
|
55 |
|