--- license: apache-2.0 datasets: - BAAI/OPI language: - en pipeline_tag: text-generation tags: - Life Science - AI4Science - Biology - Protein - LLM - Instruction base_model: facebook/galactica-6.7b --- ![OPI_logo](demo_figures/OPI_logo.png) # Model Card of OPI-Galactica-6.7B OPI-Galactica-6.7B was fine-tuned from the Galactica-6.7B model using the complete OPI dataset (i.e.,[OPI_full_1.61M.json](https://huggingface.co/datasets/BAAI/OPI/blob/main/OPI_DATA/OPI_full_1.61M_train.json)). For more details of training and testing, please visit [https://github.com/baaihealth/opi](https://github.com/baaihealth/opi). ![Overview](demo_figures/OPI_experiment_outline.png) # Evaluation of OPI-Galactica-6.7B model on 9 tasks Each testing result is derived from the Galactica-6.7B model that has been fine-tuned using [OPI_full_1.61M.json](https://huggingface.co/datasets/BAAI/OPI/blob/main/OPI_DATA/OPI_full_1.61M_train.json) and subsequently evaluated on the respective testing set for each specific task.
Task Type | Task Name | Testing file | Accuracy | Precision | Recall | F1 | Rouge-L |
---|---|---|---|---|---|---|---|
Sequence Understanding | EC Number Prediction (split100) | CLEAN_EC_number_new_test | - | 0.2700 | 0.2663 | 0.2596 | - |
CLEAN_EC_number_price_test | - | 0.0268 | 0.0268 | 0.0268 | - | ||
Fold Type Prediction | fold_type_test_Fold_Holdout | 0.0808 | - | - | - | - | |
fold_type_test_Superfamily_Holdout | 0.1348 | - | - | - | - | ||
fold_type_test_Family_Holdout | 0.4854 | - | - | - | - | ||
Subcellular Localization Prediction | subcell_loc_test | 0.7771 | - | - | - | - | |
Annotation Prediction | Function Keywords Prediction | CASPSimilarSeq_keywords_test | - | 0.8120 | 0.7360 | 0.7643 | - |
Function Keywords Prediction | IDFilterSeq_keywords_test | - | 0.8377 | 0.8019 | 0.8070 | - | |
Function Keywords Prediction | UniProtSeq_keywords_test | - | 0.8596 | 0.8196 | 0.8276 | - | |
Gene Ontology (GO) Terms Prediction | CASPSimilarSeq_go_terms_test | - | 0.7613 | 0.7492 | 0.7476 | - | |
Gene Ontology (GO) Terms Prediction | IDFilterSeq_go_terms_test | - | 0.7404 | 0.7274 | 0.7207 | - | |
Gene Ontology (GO) Terms Prediction | UniProtSeq_go_terms_test | - | 0.7638 | 0.7373 | 0.7358 | - | |
Function Description Prediction | CASPSimilarSeq_function_test | - | - | - | - | 0.7430 | |
Function Description Prediction | IDFilterSeq_function_test | - | - | - | - | 0.7014 | |
Function Description Prediction | UniProtSeq_function_test | - | - | - | - | 0.7133 | |
Knowledge Mining | Tissue Location Prediction from Gene Symbol | gene_symbol_to_tissue_test | - | 0.3917 | 0.9077 | 0.5303 | - |
Cancer Prediction from Gene Symbol | gene_symbol_to_cancer_test | - | 0.3555 | 0.3189 | 0.3229 | - | |
Cancer Prediction from Gene Name | gene_name_to_cancer_test | - | 0.2728 | 0.2554 | 0.2533 | - |