license: cc-by-nc-4.0
Precious3-GPT-Multi-Modal inference
Model inference is running at HuggingFace Inference endpoint
Definitions
- Signature: up- and down-gene lists
Generation config .json Overview
In the following example all possible configuration fields are specified. You can leave some meta-data fields in inputs
section empty string(""
) or empty list([]
).
Example 1
If you want to generate signature given specific meta-data you can use the following configuration. Note, up
and down
fields are empty lists as you want to generate them.
{
"inputs": {
"instruction": "disease2diff2disease",
"tissue": ["whole blood"],
"age": "",
"cell": "",
"efo": "Orphanet_139399",
"datatype": "",
"drug": "",
"dose": "",
"time": "",
"case": "",
"control": "",
"dataset_type": "expression",
"gender": "m",
"species": "human",
"up": [],
"down": []
},
"mode": "meta2diff",
"parameters": {
"temperature": 0.8,
"top_p": 0.2,
"top_k": 3550,
"n_next_tokens": 50
}
}
Here we asked model to generate signature for Human, male, in tissue - whole blood with disease Orphanet_139399.
Example 2
You want to generate signature for healthy Human, male, 40 years, in tissue - whole blood.
{
"inputs": {
"instruction": "disease2diff2disease",
"tissue": ["whole blood"],
"age": 40,
"cell": "",
"efo": "",
"datatype": "",
"drug": "",
"dose": "",
"time": "",
"case": "",
"control": "",
"dataset_type": "expression",
"gender": "m",
"species": "human",
"up": [],
"down": []
},
"mode": "meta2diff",
"parameters": {
"temperature": 0.8,
"top_p": 0.2,
"top_k": 3550,
"n_next_tokens": 50
}
}
Note, here we used disease2diff2disease
instruction, but we expected to generate signatures for healthy human, that's why we'd set efo
to empty string "".
Alternatively, we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_individ2diff2age_individ"]
Run generation step by step
Step 1 - connect to endpoint
import requests
API_URL = "https://cu2s6lgb4jew3tht.us-east-1.aws.endpoints.huggingface.cloud"
headers = {
"Accept" : "application/json",
"Authorization": "Bearer hf_XXXX",
"Content-Type": "application/json"
}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
Step 2 - create input for endpoint
import json
with open('./generation-configs/meta2diff.json', 'r') as f:
config_data = json.load(f)
# prepare sample
config_sample = {"inputs": config_data, "mode": "meta2diff", "parameters": {
"temperature": 0.4,
"top_p": 0.2,
"top_k": 3550,
"n_next_tokens": 50
}}
Expected input at Step 2.
{
"inputs": {
"instruction": "disease2diff2disease",
"tissue": ["whole blood"],
"age": "",
"cell": "",
"efo": "Orphanet_139399",
"datatype": "", "drug": "", "dose": "", "time": "", "case": "", "control": "", "dataset_type": "expression ", "gender": "m", "species": "human ", "up": [], "down": []
},
"mode": "meta2diff",
"parameters": {
"temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50
}
}
Step 3. Send request to endpoint
output = query(config_sample)
Generation Modes (mode
in config)
Choose the appropriate mode based on your requirements:
- meta2diff: Generate signature given meta-data such as tissue, compound, gender, etc.
- diff2compound: Predict compounds based on signature.
- meta2diff2compound: Generate signatures given meta-data and then predict compounds based on generated signatures.
Instruction (inputs.instruction
in config)
You can use the following instructions (one or several at a time):
- disease2diff2disease - generate signature for disease
- compound2diff2compound - generate signature for compound
- age_group2diff2age_group - generate signature for age-group
- age_individ2diff2age_individ - generate signature based on age value
Other meta-data (inputs.
in config)
- Age (
age
) for human - in years, for macaque and mouse - in days - Full list of available values for each meta-data item you can find in
p3_entities_with_type.csv
Multi-Modality
Applies by default in tasks where you pass signature. For each gene in up- and down- lists model gets embeddings from Knowledge Graph and Text NNs. Then embeddings are averaged in order to obtain one embedding for each modality for each gene list (4 averaged embeddings in total).