precious3-gpt / README.md
stefan-insilico's picture
Update README.md
febc06d verified
|
raw
history blame
6.94 kB
metadata
license: cc-by-nc-4.0

Precious3-GPT

A multi-omics multi-species language model.

  • Developer: Insilico Medicine
  • License: cc-by-nc-4.0
  • Model size: 88.3 million parameters
  • Domain: Biomedical
  • Base architecture: MPT

Quickstart

Precious-GPT can be loaded and run as standard Causal Language Model through transformers interface like this:

# Load model and tokenizer
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
model = AutoModel.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)

However for the convenience of using all the functionality of the Precious3-GPT model, we provide a handler.

Run model using Prpecious3-GPT handler step by step

Step 1 - download Prpecious3-GPT handler.py

from handler import EndpointHandler
precious3gpt_handler = EndpointHandler()

Step 2 - create input for the handler

import json
with open('./generation-configs/meta2diff.json', 'r') as f:
    config_data = json.load(f)

# prepare request configuration
request_config = {"inputs": config_data, "mode": "meta2diff", "parameters": {
    "temperature": 0.8,
    "top_p": 0.2,
    "top_k": 3550,
    "n_next_tokens": 50,
    "random_seed": 137
}}

How Precisou3-GPT will see given request

[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><age_individ></age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug>curcumin </drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species>

Step 3 - run Precisou3-GPT

output = precious3gpt_handler(request_config)

Handler output structure

{
    "output": {
        "up": List, 
        "down": List
    },
    "mode": String, // Generation mode was selected
    "message": "Done!",  // or Error
    "input": String // Input prompt was passed

}

Note: If the mode was supposed to generate compounds, the output would contain compounds: List.

Precious3-GPT request configuration

Generation Modes (mode in config)

Choose the appropriate mode based on your requirements:

  1. meta2diff: Generate signature (up- and down- gene lists) given meta-data such as tissue, compound, gender, etc.
  2. diff2compound: Predict compounds based on signature.
  3. meta2diff2compound: Generate signatures given meta-data and then predict compounds based on generated signatures.

Instruction (inputs.instruction in config)

  1. disease2diff2disease - generate signature for disease / predict disease based on given signature
  2. compound2diff2compound - generate signature for compound / predict compound based on given signature
  3. age_group2diff2age_group - generate signature for age group / predict age group based on signature

Other meta-data (inputs. in config)

Full list of available values for each meta-data item you can find in p3_entities_with_type.csv

Examples

In the following examples all possible configuration fields are specified. You can leave some meta-data fields in the inputs section empty string("") or empty list([]).

Example 1

If you want to generate a signature given specific meta-data you can use the following configuration. Note, up and down fields are empty lists as you want to generate them. Here we ask the model to generate a signature for a human within the age group of 70-90 years, male, in tissue - Lungs with disease EFO_0000768.

{
    "inputs": {
        "instruction": ["age_group2diff2age_group", "disease2diff2disease", "compound2diff2compound"], 
        "tissue": ["lung"],
        "age": "",
        "cell": "", 
        "efo": "EFO_0000768", 
        "datatype": "", "drug": "", "dose": "", "time": "", "case": ["70.0-80.0", "80.0-90.0"], "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": []
    }, 
    "mode": "meta2diff", 
    "parameters": {
        "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137
    }
}

Here is output:

{
  "output": {
    "up": [["PTGDR2", "CABYR", "MGAM", "TMED9", "SHOX2", "MAT1A", "MUC5AC", "GASK1B", "CYP1A2", "RP11-266K4.9", ...]], // generated list of up-regulated genes
    "down": [["MB", "OR10V1", "OR51H1", "GOLGA6L10", "OR6M1", "CDX4", "OR4C45", "SPRR2A", "SPDYE9", "GBX2", "ATP4B", ...]] // generated list of down-regulated genes
  },
  "mode": "meta2diff", // generation mode we specified
  "message": "Done!",
  "input": "[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><cell></cell><efo>EFO_0000768 </efo><datatype></datatype><drug></drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>", // actual input prompt for the model
  "random_seed": 137
}

Example 2

Now let's generate a signature for a healthy human within the age group of 70-90 years, male, in tissue - whole blood. Note, here we use disease2diff2disease instruction, but we expect to generate signatures for a healthy human, that's why we'd set efo to empty string "". Alternatively, for this example we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_group2diff2age_group"]

{
    "inputs": {
        "instruction": ["disease2diff2disease", "age_group2diff2age_group"],
        "tissue": ["whole blood"],
        "age": "",
        "cell": "",
        "efo": "",
        "datatype": "", "drug": "", "dose": "", "time": "", "case": "40.0-50.0", "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [],
        "down": []
    },
    "mode": "meta2diff",
    "parameters": {
        "temperature": 0.8,
        "top_p": 0.2,
        "top_k": 3550,
        "n_next_tokens": 50,
        "random_seed": 137
    }
}

Here is output:

{
  "output": {
    "up": [["IER3", "APOC2", "EDNRB", "JAKMIP2", "BACE2", ... ]],
    "down": [["TBL1Y", "TDP1", "PLPP4", "CPEB1", "ITPR3", ... ]] 
  },
  "mode": "meta2diff",
  "message": "Done!",
  "input": "[BOS]<disease2diff2disease><age_group2diff2age_group><tissue>whole blood </tissue><cell></cell><efo></efo><datatype></datatype><drug></drug><dose></dose><time></time><case>40.0-50.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>",
  "random_seed": 137
}