{ "cells": [ { "cell_type": "code", "execution_count": 70, "id": "c7317218", "metadata": {}, "outputs": [], "source": [ "import requests\n", "from copy import copy as cp\n" ] }, { "cell_type": "markdown", "id": "c022e07b", "metadata": {}, "source": [ "## Authorize with the endpoint" ] }, { "cell_type": "code", "execution_count": 2, "id": "f1272e3f", "metadata": {}, "outputs": [], "source": [ "API_URL = \"https://YOUR.ENDPOINT.aws.endpoints.huggingface.cloud\"\n", "headers = {\n", " \"Accept\" : \"application/json\",\n", " \"Authorization\": \"Bearer hf_YOUR_TOKEN\",\n", " \"Content-Type\": \"application/json\"\n", "}\n", "\n", "def query(payload):\n", " response = requests.post(API_URL, headers=headers, json=payload)\n", " return response.json()" ] }, { "cell_type": "markdown", "id": "082c3300", "metadata": {}, "source": [ "## Construct the query\n", "Instructions define what type of experiment you are trying to simulate with P3GPT.
\n", "Key instructions enabled at this endpoint include:\n", "- **`disease2diff2disease`**: For tasks that are equivalent to case-control cross-sectional settings. E.g. the generation of DEGs for a medical condition;\n", "- **`compound2diff2compound `**: For compound screening tasks. E.g. propose a compound that can selectively methylate certain gene promoters;\n", "- **`age_group2diff2age_group`**: For task on aging-related omics dynamics. E.g. identify genes that are up-/down-regulated in older vs younger adults. \n" ] }, { "cell_type": "code", "execution_count": 139, "id": "fd84fc60", "metadata": {}, "outputs": [], "source": [ "prompt = {'instruction': ['age_group2diff2age_group','compound2diff2compound'], \n", " # This is a chemical screening experiment in a particular age group, \n", " # so you'll need to use 2 intructions\n", " 'tissue': 'lung',\n", " 'age': 70,\n", " 'cell': '',\n", " 'efo': 'EFO_0000768', #pulmonary fibrosis\n", " 'datatype': 'expression', # we want to get DEGs\n", " 'drug': 'curcumin',\n", " 'dose': '',\n", " 'time': '',\n", " 'case': ['70.0-80.0', '80.0-90.0'], # define the age groups of interest\n", " 'control': '', # left blank since no healthy controls participate in this experiment\n", " 'dataset_type': '',\n", " 'gender': 'm',\n", " 'species': 'human',\n", " 'up': [], # left blank to be filled in by P3GPT\n", " 'down': []\n", " }\n", "\n" ] }, { "cell_type": "markdown", "id": "609bd3c0", "metadata": {}, "source": [ "## Execution modes\n", "- **`meta2diff`**: `compound2diff2compound` can be executed either way. This mode tells P3GPT to return differentially expressed genes and not compounds;\n", "- **`diff2compound`**: The reverse of the `meta2diff` mode. Make sure to fill in 'up' and 'down' in the prompt first!\n", "- **`meta2diff2compound`**: Runs `meta2diff` first and applies `diff2compound` to its output. This is mostly for utility reasons — you get to run P3GPT twice with one call.\n", "\n", "As an LLM, P3GPT is trained to fill in the blanks in its prompt pointed at by the instructions. Its native output has the same structure as the input prompt.
\n", "Modes do not belong in the prompt and are used for parsing P3GPT's output so that only the expected part of the completed prompt is presented to the user." ] }, { "cell_type": "code", "execution_count": 140, "id": "c6280337", "metadata": {}, "outputs": [], "source": [ "config_sample = {'inputs': prompt,\n", " 'mode': 'meta2diff', # this is a chemical screening experiment \n", " 'parameters': {'temperature': 0.4,\n", " 'top_p': 0.8,\n", " 'top_k': 3550,\n", " 'n_next_tokens': 20}\n", " }\n", "output = query(config_sample) # send request to Hugging Face" ] }, { "cell_type": "code", "execution_count": 141, "id": "47a3f882", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys(['output', 'mode', 'message', 'input'])\n" ] } ], "source": [ "print(output.keys())" ] }, { "cell_type": "code", "execution_count": 142, "id": "5408079c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Done!'" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# successful generation\n", "output['message']" ] }, { "cell_type": "code", "execution_count": 143, "id": "f51d4314", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[BOS]lung 70 EFO_0000768 expression curcumin 70.0-80.0 80.0-90.0 m human '" ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this is what actual P3GPT input looks like\n", "# NB: there is no 'mode' in the prompt. \n", "output['input']" ] }, { "cell_type": "code", "execution_count": 144, "id": "08c9f49a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Up-regulated genes:\n", "MUC5B; AHSP; ALAS2; SLC4A1; CDHR5; NXF2B; CYP4F3; LGALS7B; FBN3; NTS; CYSTM1; ORM2; ASL; CD177; GLRX5; H4C3; NDUFA3; TUBA4B; EPB42; GCHFR\n", "\n", "Down-regulated genes:\n", "KRT6A; KRT5; KRT15; KRT14; KRT6B; DSG3; CALML3; S100A7; SERPINB5; SPRR2A; SPRR3; LY6D; TMEM45A; KRT16; S100A9; GOLGA8A; SPINK6; CXCL10; CXCL9; CSTA\n", "\n" ] } ], "source": [ "# output gene symbols\n", "genes_up, genes_dn = output['output']['up'][0], output['output']['down'][0]\n", "print(\"Up-regulated genes:\")\n", "print(*genes_up[:20], sep = \"; \",end='\\n\\n')\n", "print(\"Down-regulated genes:\")\n", "print(*genes_dn[:20], sep = \"; \",end='\\n\\n')\n" ] }, { "cell_type": "code", "execution_count": 145, "id": "f6910a3d", "metadata": {}, "outputs": [], "source": [ "# now, let's do the opposite and get a compounds based on these DEG lists\n", "# to do that, we only need a couple changes to the original prompt\n", "prompt2 = cp(prompt)\n", "prompt2.update({\n", " 'drug':'',\n", " 'up':genes_up,\n", " 'down':genes_dn\n", " })\n", "# remember to reverse meta2diff!\n", "config_sample.update({'mode':'diff2compound',\n", " 'inputs':prompt2})" ] }, { "cell_type": "code", "execution_count": 146, "id": "e791e285", "metadata": {}, "outputs": [], "source": [ "output = query(config_sample) # send request to Hugging Face" ] }, { "cell_type": "code", "execution_count": 127, "id": "8ae15313", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['output', 'compounds', 'raw_output', 'mode', 'message', 'input'])" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "output.keys()" ] }, { "cell_type": "code", "execution_count": 147, "id": "5f35f00c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "artemisinin; todralazine; dyphylline; esmolol; formestane; z160; netupitant; brd-k89304341; isoprenaline\n" ] } ], "source": [ "print(*output['compounds'][0], sep='; ')" ] }, { "cell_type": "code", "execution_count": 175, "id": "5d883cf8", "metadata": {}, "outputs": [], "source": [ "# alternatively, use the meta2diff2compound to get straigth to compounds\n", "prompt3 = cp(prompt)\n", "prompt3.update({'instruction':['compound2diff2compound']})\n", "config_sample.update({'mode':'meta2diff2compound',\n", " 'inputs':prompt3})" ] }, { "cell_type": "code", "execution_count": 176, "id": "c2adb995", "metadata": {}, "outputs": [], "source": [ "output = query(config_sample)" ] }, { "cell_type": "code", "execution_count": 178, "id": "99da6eb8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'instruction': ['compound2diff2compound'],\n", " 'tissue': 'lung',\n", " 'age': 70,\n", " 'cell': '',\n", " 'efo': 'EFO_0000768',\n", " 'datatype': 'expression',\n", " 'drug': '',\n", " 'dose': '',\n", " 'time': '',\n", " 'case': ['70.0-80.0', '80.0-90.0'],\n", " 'control': '',\n", " 'dataset_type': '',\n", " 'gender': 'm',\n", " 'species': 'human',\n", " 'up': [],\n", " 'down': []}" ] }, "execution_count": 178, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prompt3" ] }, { "cell_type": "code", "execution_count": 177, "id": "ac9c4890", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'output': [None],\n", " 'mode': 'meta2diff2compound',\n", " 'message': '62149 is not in list',\n", " 'input': '[BOS]lung 70 EFO_0000768 expression 70.0-80.0 80.0-90.0 m human '}" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "output" ] }, { "cell_type": "code", "execution_count": 167, "id": "09ec4fe2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Up-regulated genes:\n", "MUC5B; AHSP; ALAS2; SLC4A1; CDHR5; NXF2B; CYP4F3; LGALS7B; FBN3; NTS; CYSTM1; ORM2; ASL; CD177; GLRX5; H4C3; NDUFA3; TUBA4B; EPB42; GCHFR; KLF1; CFAP119; TRAPPC2L; DMTN; PDZK1IP1; SEM1; PCYT2; SERF2; CDC20; DAD1; MPC2; EMC3; BOLA1; CMTM5; PGD; EBP; GUK1; NDUFB7; UQCR11; LGALS9C; KEL; HBQ1; TUBB2A; RBX1; TMEM141; F8A1; COX7B; TMEM258; NDUFA7; MYL6; UQCRQ; MRPS24; HPGD; BOLA2B; KRTAP19-4; ATP5MF; RPL29; RPP25L; WDR83OS; FAU; UXT; ZNHIT1; SLC6A8\n", "\n", "Down-regulated genes:\n", "KRT6A; KRT5; KRT15; KRT14; KRT6B; DSG3; CALML3; S100A7; SERPINB5; SPRR2A; SPRR3; LY6D; TMEM45A; KRT16; S100A9; GOLGA8A; SPINK6; CXCL10; CXCL9; CSTA; DSC3; APOL1; CXCL8; PKIA; MYBL1; CYP26B1; POSTN; THBS1; ARL14; UPK1B; CXCL13; CXCL6; C1R; COL14A1; TNFAIP2; TIMP1; VEGFC; C1QB; COL15A1; MGP; BICC1; S100A2; XIST; MARCKS; TLR2; TYMP; RPS4Y1; COL1A1; KLF6; KRT17; FBN1; STK32B; KDM5D; SPP1; APOD; THBS2; EIF1AY; CD163; CCL8; SYNM; CD44; HSPA9; CD14; SOCS3; HSPA6; MCL1; ALOX5AP; PBX3; DDX21; IRF8; HMGA1; MAFB; RGS1; SERPINE1; FKBP5; NOVA1; GFPT2; RRP12; AGTR1; C3AR1; GBP1; CCL18; TLR4; IGSF6; MSMB; SERPINA3; HLA-DQA1; HSPB8; SLC2A1; FOXD1; MS4A14; NAMPT; FYB1; TCAF1; NCF2; SERPINA1; F13A1; GBP3; FHL2; VSIG4; IFI16; MRC1\n", "\n" ] } ], "source": [ "\n", "print(\"Up-regulated genes:\")\n", "print(*output['output']['up'][0], sep='; ', end=\"\\n\\n\")\n", "print(\"Down-regulated genes:\")\n", "print(*output['output']['down'][0], sep='; ', end=\"\\n\\n\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }