vnaumov commited on
Commit
bc7aaea
1 Parent(s): 22a6dee

Upload Precious3GPT_example.ipynb

Browse files
Files changed (1) hide show
  1. Precious3GPT_example.ipynb +407 -0
Precious3GPT_example.ipynb ADDED
@@ -0,0 +1,407 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 70,
6
+ "id": "c7317218",
7
+ "metadata": {},
8
+ "outputs": [],
9
+ "source": [
10
+ "import requests\n",
11
+ "from copy import copy as cp\n"
12
+ ]
13
+ },
14
+ {
15
+ "cell_type": "markdown",
16
+ "id": "c022e07b",
17
+ "metadata": {},
18
+ "source": [
19
+ "## Authorize with the endpoint"
20
+ ]
21
+ },
22
+ {
23
+ "cell_type": "code",
24
+ "execution_count": 2,
25
+ "id": "f1272e3f",
26
+ "metadata": {},
27
+ "outputs": [],
28
+ "source": [
29
+ "API_URL = \"https://YOUR.ENDPOINT.aws.endpoints.huggingface.cloud\"\n",
30
+ "headers = {\n",
31
+ " \"Accept\" : \"application/json\",\n",
32
+ " \"Authorization\": \"Bearer hf_YOUR_TOKEN\",\n",
33
+ " \"Content-Type\": \"application/json\"\n",
34
+ "}\n",
35
+ "\n",
36
+ "def query(payload):\n",
37
+ " response = requests.post(API_URL, headers=headers, json=payload)\n",
38
+ " return response.json()"
39
+ ]
40
+ },
41
+ {
42
+ "cell_type": "markdown",
43
+ "id": "082c3300",
44
+ "metadata": {},
45
+ "source": [
46
+ "## Construct the query\n",
47
+ "Instructions define what type of experiment you are trying to simulate with P3GPT.<br>\n",
48
+ "Key instructions enabled at this endpoint include:\n",
49
+ "- <font size=\"4\">**`disease2diff2disease`**</font>: For tasks that are equivalent to case-control cross-sectional settings. E.g. the generation of DEGs for a medical condition;\n",
50
+ "- <font size=\"4\">**`compound2diff2compound `**</font>: For compound screening tasks. E.g. propose a compound that can selectively methylate certain gene promoters;\n",
51
+ "- <font size=\"4\">**`age_group2diff2age_group`**</font>: For task on aging-related omics dynamics. E.g. identify genes that are up-/down-regulated in older vs younger adults. \n"
52
+ ]
53
+ },
54
+ {
55
+ "cell_type": "code",
56
+ "execution_count": 139,
57
+ "id": "fd84fc60",
58
+ "metadata": {},
59
+ "outputs": [],
60
+ "source": [
61
+ "prompt = {'instruction': ['age_group2diff2age_group','compound2diff2compound'], \n",
62
+ " # This is a chemical screening experiment in a particular age group, \n",
63
+ " # so you'll need to use 2 intructions\n",
64
+ " 'tissue': 'lung',\n",
65
+ " 'age': 70,\n",
66
+ " 'cell': '',\n",
67
+ " 'efo': 'EFO_0000768', #pulmonary fibrosis\n",
68
+ " 'datatype': 'expression', # we want to get DEGs\n",
69
+ " 'drug': 'curcumin',\n",
70
+ " 'dose': '',\n",
71
+ " 'time': '',\n",
72
+ " 'case': ['70.0-80.0', '80.0-90.0'], # define the age groups of interest\n",
73
+ " 'control': '', # left blank since no healthy controls participate in this experiment\n",
74
+ " 'dataset_type': '',\n",
75
+ " 'gender': 'm',\n",
76
+ " 'species': 'human',\n",
77
+ " 'up': [], # left blank to be filled in by P3GPT\n",
78
+ " 'down': []\n",
79
+ " }\n",
80
+ "\n"
81
+ ]
82
+ },
83
+ {
84
+ "cell_type": "markdown",
85
+ "id": "609bd3c0",
86
+ "metadata": {},
87
+ "source": [
88
+ "## Execution modes\n",
89
+ "- <font size=\"4\">**`meta2diff`**</font>: `compound2diff2compound` can be executed either way. This mode tells P3GPT to return differentially expressed genes and not compounds;\n",
90
+ "- <font size=\"4\">**`diff2compound`**</font>: The reverse of the `meta2diff` mode. Make sure to fill in 'up' and 'down' in the prompt first!\n",
91
+ "- <font size=\"4\">**`meta2diff2compound`**</font>: Runs `meta2diff` first and applies `diff2compound` to its output. This is mostly for utility reasons — you get to run P3GPT twice with one call.\n",
92
+ "\n",
93
+ "As an LLM, P3GPT is trained to fill in the blanks in its prompt pointed at by the instructions. Its native output has the same structure as the input prompt.<br>\n",
94
+ "Modes do not belong in the prompt and are used for parsing P3GPT's output so that only the expected part of the completed prompt is presented to the user."
95
+ ]
96
+ },
97
+ {
98
+ "cell_type": "code",
99
+ "execution_count": 140,
100
+ "id": "c6280337",
101
+ "metadata": {},
102
+ "outputs": [],
103
+ "source": [
104
+ "config_sample = {'inputs': prompt,\n",
105
+ " 'mode': 'meta2diff', # this is a chemical screening experiment \n",
106
+ " 'parameters': {'temperature': 0.4,\n",
107
+ " 'top_p': 0.8,\n",
108
+ " 'top_k': 3550,\n",
109
+ " 'n_next_tokens': 20}\n",
110
+ " }\n",
111
+ "output = query(config_sample) # send request to Hugging Face"
112
+ ]
113
+ },
114
+ {
115
+ "cell_type": "code",
116
+ "execution_count": 141,
117
+ "id": "47a3f882",
118
+ "metadata": {},
119
+ "outputs": [
120
+ {
121
+ "name": "stdout",
122
+ "output_type": "stream",
123
+ "text": [
124
+ "dict_keys(['output', 'mode', 'message', 'input'])\n"
125
+ ]
126
+ }
127
+ ],
128
+ "source": [
129
+ "print(output.keys())"
130
+ ]
131
+ },
132
+ {
133
+ "cell_type": "code",
134
+ "execution_count": 142,
135
+ "id": "5408079c",
136
+ "metadata": {},
137
+ "outputs": [
138
+ {
139
+ "data": {
140
+ "text/plain": [
141
+ "'Done!'"
142
+ ]
143
+ },
144
+ "execution_count": 142,
145
+ "metadata": {},
146
+ "output_type": "execute_result"
147
+ }
148
+ ],
149
+ "source": [
150
+ "# successful generation\n",
151
+ "output['message']"
152
+ ]
153
+ },
154
+ {
155
+ "cell_type": "code",
156
+ "execution_count": 143,
157
+ "id": "f51d4314",
158
+ "metadata": {},
159
+ "outputs": [
160
+ {
161
+ "data": {
162
+ "text/plain": [
163
+ "'[BOS]<age_group2diff2age_group><compound2diff2compound><tissue>lung </tissue><age_individ>70 </age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug>curcumin </drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species>'"
164
+ ]
165
+ },
166
+ "execution_count": 143,
167
+ "metadata": {},
168
+ "output_type": "execute_result"
169
+ }
170
+ ],
171
+ "source": [
172
+ "# this is what actual P3GPT input looks like\n",
173
+ "# NB: there is no 'mode' in the prompt. \n",
174
+ "output['input']"
175
+ ]
176
+ },
177
+ {
178
+ "cell_type": "code",
179
+ "execution_count": 144,
180
+ "id": "08c9f49a",
181
+ "metadata": {},
182
+ "outputs": [
183
+ {
184
+ "name": "stdout",
185
+ "output_type": "stream",
186
+ "text": [
187
+ "Up-regulated genes:\n",
188
+ "MUC5B; AHSP; ALAS2; SLC4A1; CDHR5; NXF2B; CYP4F3; LGALS7B; FBN3; NTS; CYSTM1; ORM2; ASL; CD177; GLRX5; H4C3; NDUFA3; TUBA4B; EPB42; GCHFR\n",
189
+ "\n",
190
+ "Down-regulated genes:\n",
191
+ "KRT6A; KRT5; KRT15; KRT14; KRT6B; DSG3; CALML3; S100A7; SERPINB5; SPRR2A; SPRR3; LY6D; TMEM45A; KRT16; S100A9; GOLGA8A; SPINK6; CXCL10; CXCL9; CSTA\n",
192
+ "\n"
193
+ ]
194
+ }
195
+ ],
196
+ "source": [
197
+ "# output gene symbols\n",
198
+ "genes_up, genes_dn = output['output']['up'][0], output['output']['down'][0]\n",
199
+ "print(\"Up-regulated genes:\")\n",
200
+ "print(*genes_up[:20], sep = \"; \",end='\\n\\n')\n",
201
+ "print(\"Down-regulated genes:\")\n",
202
+ "print(*genes_dn[:20], sep = \"; \",end='\\n\\n')\n"
203
+ ]
204
+ },
205
+ {
206
+ "cell_type": "code",
207
+ "execution_count": 145,
208
+ "id": "f6910a3d",
209
+ "metadata": {},
210
+ "outputs": [],
211
+ "source": [
212
+ "# now, let's do the opposite and get a compounds based on these DEG lists\n",
213
+ "# to do that, we only need a couple changes to the original prompt\n",
214
+ "prompt2 = cp(prompt)\n",
215
+ "prompt2.update({\n",
216
+ " 'drug':'',\n",
217
+ " 'up':genes_up,\n",
218
+ " 'down':genes_dn\n",
219
+ " })\n",
220
+ "# remember to reverse meta2diff!\n",
221
+ "config_sample.update({'mode':'diff2compound',\n",
222
+ " 'inputs':prompt2})"
223
+ ]
224
+ },
225
+ {
226
+ "cell_type": "code",
227
+ "execution_count": 146,
228
+ "id": "e791e285",
229
+ "metadata": {},
230
+ "outputs": [],
231
+ "source": [
232
+ "output = query(config_sample) # send request to Hugging Face"
233
+ ]
234
+ },
235
+ {
236
+ "cell_type": "code",
237
+ "execution_count": 127,
238
+ "id": "8ae15313",
239
+ "metadata": {},
240
+ "outputs": [
241
+ {
242
+ "data": {
243
+ "text/plain": [
244
+ "dict_keys(['output', 'compounds', 'raw_output', 'mode', 'message', 'input'])"
245
+ ]
246
+ },
247
+ "execution_count": 127,
248
+ "metadata": {},
249
+ "output_type": "execute_result"
250
+ }
251
+ ],
252
+ "source": [
253
+ "output.keys()"
254
+ ]
255
+ },
256
+ {
257
+ "cell_type": "code",
258
+ "execution_count": 147,
259
+ "id": "5f35f00c",
260
+ "metadata": {},
261
+ "outputs": [
262
+ {
263
+ "name": "stdout",
264
+ "output_type": "stream",
265
+ "text": [
266
+ "artemisinin; todralazine; dyphylline; esmolol; formestane; z160; netupitant; brd-k89304341; isoprenaline\n"
267
+ ]
268
+ }
269
+ ],
270
+ "source": [
271
+ "print(*output['compounds'][0], sep='; ')"
272
+ ]
273
+ },
274
+ {
275
+ "cell_type": "code",
276
+ "execution_count": 175,
277
+ "id": "5d883cf8",
278
+ "metadata": {},
279
+ "outputs": [],
280
+ "source": [
281
+ "# alternatively, use the meta2diff2compound to get straigth to compounds\n",
282
+ "prompt3 = cp(prompt)\n",
283
+ "prompt3.update({'instruction':['compound2diff2compound']})\n",
284
+ "config_sample.update({'mode':'meta2diff2compound',\n",
285
+ " 'inputs':prompt3})"
286
+ ]
287
+ },
288
+ {
289
+ "cell_type": "code",
290
+ "execution_count": 176,
291
+ "id": "c2adb995",
292
+ "metadata": {},
293
+ "outputs": [],
294
+ "source": [
295
+ "output = query(config_sample)"
296
+ ]
297
+ },
298
+ {
299
+ "cell_type": "code",
300
+ "execution_count": 178,
301
+ "id": "99da6eb8",
302
+ "metadata": {},
303
+ "outputs": [
304
+ {
305
+ "data": {
306
+ "text/plain": [
307
+ "{'instruction': ['compound2diff2compound'],\n",
308
+ " 'tissue': 'lung',\n",
309
+ " 'age': 70,\n",
310
+ " 'cell': '',\n",
311
+ " 'efo': 'EFO_0000768',\n",
312
+ " 'datatype': 'expression',\n",
313
+ " 'drug': '',\n",
314
+ " 'dose': '',\n",
315
+ " 'time': '',\n",
316
+ " 'case': ['70.0-80.0', '80.0-90.0'],\n",
317
+ " 'control': '',\n",
318
+ " 'dataset_type': '',\n",
319
+ " 'gender': 'm',\n",
320
+ " 'species': 'human',\n",
321
+ " 'up': [],\n",
322
+ " 'down': []}"
323
+ ]
324
+ },
325
+ "execution_count": 178,
326
+ "metadata": {},
327
+ "output_type": "execute_result"
328
+ }
329
+ ],
330
+ "source": [
331
+ "prompt3"
332
+ ]
333
+ },
334
+ {
335
+ "cell_type": "code",
336
+ "execution_count": 177,
337
+ "id": "ac9c4890",
338
+ "metadata": {},
339
+ "outputs": [
340
+ {
341
+ "data": {
342
+ "text/plain": [
343
+ "{'output': [None],\n",
344
+ " 'mode': 'meta2diff2compound',\n",
345
+ " 'message': '62149 is not in list',\n",
346
+ " 'input': '[BOS]<compound2diff2compound><tissue>lung </tissue><age_individ>70 </age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug></drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species>'}"
347
+ ]
348
+ },
349
+ "execution_count": 177,
350
+ "metadata": {},
351
+ "output_type": "execute_result"
352
+ }
353
+ ],
354
+ "source": [
355
+ "output"
356
+ ]
357
+ },
358
+ {
359
+ "cell_type": "code",
360
+ "execution_count": 167,
361
+ "id": "09ec4fe2",
362
+ "metadata": {},
363
+ "outputs": [
364
+ {
365
+ "name": "stdout",
366
+ "output_type": "stream",
367
+ "text": [
368
+ "Up-regulated genes:\n",
369
+ "MUC5B; AHSP; ALAS2; SLC4A1; CDHR5; NXF2B; CYP4F3; LGALS7B; FBN3; NTS; CYSTM1; ORM2; ASL; CD177; GLRX5; H4C3; NDUFA3; TUBA4B; EPB42; GCHFR; KLF1; CFAP119; TRAPPC2L; DMTN; PDZK1IP1; SEM1; PCYT2; SERF2; CDC20; DAD1; MPC2; EMC3; BOLA1; CMTM5; PGD; EBP; GUK1; NDUFB7; UQCR11; LGALS9C; KEL; HBQ1; TUBB2A; RBX1; TMEM141; F8A1; COX7B; TMEM258; NDUFA7; MYL6; UQCRQ; MRPS24; HPGD; BOLA2B; KRTAP19-4; ATP5MF; RPL29; RPP25L; WDR83OS; FAU; UXT; ZNHIT1; SLC6A8\n",
370
+ "\n",
371
+ "Down-regulated genes:\n",
372
+ "KRT6A; KRT5; KRT15; KRT14; KRT6B; DSG3; CALML3; S100A7; SERPINB5; SPRR2A; SPRR3; LY6D; TMEM45A; KRT16; S100A9; GOLGA8A; SPINK6; CXCL10; CXCL9; CSTA; DSC3; APOL1; CXCL8; PKIA; MYBL1; CYP26B1; POSTN; THBS1; ARL14; UPK1B; CXCL13; CXCL6; C1R; COL14A1; TNFAIP2; TIMP1; VEGFC; C1QB; COL15A1; MGP; BICC1; S100A2; XIST; MARCKS; TLR2; TYMP; RPS4Y1; COL1A1; KLF6; KRT17; FBN1; STK32B; KDM5D; SPP1; APOD; THBS2; EIF1AY; CD163; CCL8; SYNM; CD44; HSPA9; CD14; SOCS3; HSPA6; MCL1; ALOX5AP; PBX3; DDX21; IRF8; HMGA1; MAFB; RGS1; SERPINE1; FKBP5; NOVA1; GFPT2; RRP12; AGTR1; C3AR1; GBP1; CCL18; TLR4; IGSF6; MSMB; SERPINA3; HLA-DQA1; HSPB8; SLC2A1; FOXD1; MS4A14; NAMPT; FYB1; TCAF1; NCF2; SERPINA1; F13A1; GBP3; FHL2; VSIG4; IFI16; MRC1\n",
373
+ "\n"
374
+ ]
375
+ }
376
+ ],
377
+ "source": [
378
+ "\n",
379
+ "print(\"Up-regulated genes:\")\n",
380
+ "print(*output['output']['up'][0], sep='; ', end=\"\\n\\n\")\n",
381
+ "print(\"Down-regulated genes:\")\n",
382
+ "print(*output['output']['down'][0], sep='; ', end=\"\\n\\n\")"
383
+ ]
384
+ }
385
+ ],
386
+ "metadata": {
387
+ "kernelspec": {
388
+ "display_name": "Python 3 (ipykernel)",
389
+ "language": "python",
390
+ "name": "python3"
391
+ },
392
+ "language_info": {
393
+ "codemirror_mode": {
394
+ "name": "ipython",
395
+ "version": 3
396
+ },
397
+ "file_extension": ".py",
398
+ "mimetype": "text/x-python",
399
+ "name": "python",
400
+ "nbconvert_exporter": "python",
401
+ "pygments_lexer": "ipython3",
402
+ "version": "3.9.5"
403
+ }
404
+ },
405
+ "nbformat": 4,
406
+ "nbformat_minor": 5
407
+ }