stefan-insilico's picture
Update README.md
b5dd395 verified
|
raw
history blame
3.92 kB
metadata
license: cc-by-nc-4.0

Precious3-GPT-Multi-Modal inference

Model inference is running at HuggingFace Inference endpoint

Definitions

  • Signature: up- and down-gene lists

Generation config .json Overview

In the following example all possible configuration fields are specified. You can leave some meta-data fields in inputs section empty string("") or empty list([]).

For example, if you want to generate signature given specific meta-data you can use the following configuration. Note, up and down fields are empty lists as you want to generate them.

Another example - predict compound based on signature. You can take

{
    "inputs": {
        "instruction": "compound2diff2compound",
        "tissue": ["whole blood"],
        "age": "",
        "cell": "u937",
        "efo": "Orphanet_139399",
        "datatype": "",
        "drug": "",
        "dose": "",
        "time": "",
        "case": "",
        "control": "",
        "dataset_type": "expression",
        "gender": "m",
        "species": "human",
        "up": [],
        "down": []
    },
    "mode": "meta2diff",
    "parameters": {
        "temperature": 0.8,
        "top_p": 0.2,
        "top_k": 3550,
        "n_next_tokens": 50
    }
}

Run generation

Step 1.


import requests

API_URL = "https://cu2s6lgb4jew3tht.us-east-1.aws.endpoints.huggingface.cloud"
headers = {
    "Accept" : "application/json",
    "Authorization": "Bearer hf_XXXX",
    "Content-Type": "application/json" 
}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

Step 2.

import json
with open('./generation-configs/meta2diff.json', 'r') as f:
    config_data = json.load(f)

# prepare sample
config_sample = {"inputs": config_data, "mode": "diff2compound", "parameters": {
    "temperature": 0.4,
    "top_p": 0.2,
    "top_k": 3550,
    "n_next_tokens": 50
}}

Step 2 - prepared configuration

{
    "inputs": {
        "instruction": "disease2diff2disease", 
        "tissue": ["whole blood"],
        "age": "",
        "cell": "u937 ", 
        "efo": "Orphanet_139399", 
        "datatype": "", "drug": "", "dose": "", "time": "", "case": "", "control": "", "dataset_type": "expression ", "gender": "m ", "species": "human ", "up": [], "down": []
    }, 
    "mode": "meta2diff2compound", 
    "parameters": {
        "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50
    }
}

Step 3. Send request to endpoint

output = query(config_sample)

Generation Modes (mode in config)

Choose the appropriate mode based on your requirements:

  1. meta2diff: Generate signature given meta-data such as tissue, compound, gender, etc.
  2. diff2compound: Predict compounds based on signature.
  3. meta2diff2compound: Generate signatures given meta-data and then predict compounds based on generated signatures.

Instruction (inputs.instruction in config)

You can use the following instructions (one or several at a time):

  1. disease2diff2disease - generate signature for disease
  2. compound2diff2compound - generate signature for compound
  3. age_group2diff2age_group - generate signature for age-group
  4. age_individ2diff2age_individ - generate signature based on age value

Other meta-data (inputs. in config)

  1. Age (age) for human - in years, for macaque and mouse - in days
  2. Full list of available values for each meta-data item you can find in p3_entities_with_type.csv

Multi-Modality

Applies by default in tasks where you pass signature. For each gene in up- and down- lists model gets embeddings from Knowledge Graph and Text NNs. Then embeddings are averaged in order to obtain one embedding for each modality for each gene list (4 averaged embeddings in total).