CIFM

File size: 7,667 Bytes

be4e6f5
 
 
 
4981657
be4e6f5
 
 
 
 
552cf9a
be4e6f5
 
 
2f63b5b
 
 
 
 
 
 
be4e6f5
 
4981657
be4e6f5
 
 
 
4981657
 
 
 
 
be4e6f5
4981657
be4e6f5
 
 
4981657
be4e6f5
 
 
14bd8e0
 
 
 
 
 
 
 
be4e6f5
 
2f63b5b
 
 
 
 
 
 
 
 
 
 
be4e6f5
 
4981657
be4e6f5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4981657
be4e6f5
 
 
 
2f63b5b
 
 
 
 
 
afb06a1
8b99783
ee51cb8
2f63b5b
 
be4e6f5
 
4981657
be4e6f5
 
 
 
 
 
4981657
be4e6f5
 
 
 
2f63b5b
14bd8e0
2f63b5b
 
 
 
 
 
 
be4e6f5
 
 
 
 
 
 
 
 
 
4981657
 
 
be4e6f5
4981657
 
 
 
be4e6f5
 
 
 
 
 
 
 
 
 
 
 
 
2f63b5b
 
 
 
 
 
 
be4e6f5
 
4981657
be4e6f5
 
 
 
 
4981657
 
be4e6f5
 
4981657
 
 
be4e6f5
 
 
4981657
be4e6f5
 
 
 
 
b2d16e9
83d3a3b
be4e6f5
 
83d3a3b
 
be4e6f5
 
83d3a3b
be4e6f5
 
a9af6e4
 
 
4981657
a9af6e4
4981657
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a9af6e4
 
 
4273817
 
a9af6e4
be4e6f5

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import numpy as np\n",
    "from models_cifm.cifm import CIFM\n",
    "import scanpy as sc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. load model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "18d58ba0049e4560b7bd0916fbd6ea33",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model.safetensors:   0%|          | 0.00/569M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def load_model():\n",
    "    args_model = torch.load('./models_cifm/args.pt')\n",
    "    device = 'cpu' # or 'cuda' if you have a GPU\n",
    "    model = CIFM.from_pretrained('ynyou/CIFM', args=args_model).to(device)\n",
    "    model.channel2ensembl_ids_source = torch.load('./models_cifm/channel2ensembl.pt')\n",
    "    model.eval()\n",
    "    return model\n",
    "model = load_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. load and preprocess sample adata\n",
    "- some requirements for adata:\n",
    "- ```adata.X```: need to the raw count\n",
    "- ```adata.obsm['spatial']```: the coordinates of cells in the unit of micrometer\n",
    "- if in a different unit, it might result in a weird geometric graph: we use a radius 20 (micrometer) to construct the geometric graph in the model, so a different unit might result in a overly sparse or dense graph"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AnnData object with n_obs × n_vars = 24844 × 18289\n",
       "    obs: 'in_tissue'\n",
       "    var: 'feature_types', 'genome', 'gene_names'\n",
       "    uns: 'log1p'\n",
       "    obsm: 'spatial'\n",
       "    layers: 'counts'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adata = sc.read_h5ad('./adata.h5ad')\n",
    "adata.layers['counts'] = adata.X.copy()\n",
    "sc.pp.normalize_total(adata, target_sum=1e4)\n",
    "sc.pp.log1p(adata)\n",
    "adata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. match feature channels\n",
    "- we need a list which maps feature channels to ensemble ids: ```channel2ensembl_ids_target```\n",
    "- format: ```channel2ensembl_ids_target = [[ensemblid1_for_channel1, ensemblid2_for_channel1, ...], [ensemblid1_for_channel2, ensemblid2_for_channel2, ...], ...]```\n",
    "- one channel could correspond to multiple ensemble ids, e.g., when in your original data the channels are annotated with gene names\n",
    "- you can use BioMart to map your gene name into one or multiple ensemble ids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "matching 18289 gene channels out of 18289 ; unmatched channels: []\n"
     ]
    }
   ],
   "source": [
    "channel2ensembl_ids_target = [[i] for i in adata.var.index.tolist()]\n",
    "model.channel_matching(channel2ensembl_ids_target, model.channel2ensembl_ids_source)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. embed the microenvironments centered at each cell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([[-0.4326, -0.8625,  0.1121,  ...,  0.4980,  0.3855, -0.1965],\n",
       "         [-0.6833, -0.9950,  0.1927,  ..., -0.2064,  0.6193,  0.0387],\n",
       "         [-0.2099, -0.9877,  0.3462,  ...,  0.2102,  0.6807, -0.2155],\n",
       "         ...,\n",
       "         [-0.0187, -0.8444,  0.3058,  ...,  0.1030,  0.8362, -0.1859],\n",
       "         [-0.5535, -0.8201,  0.7805,  ..., -0.1402,  0.5221, -0.3520],\n",
       "         [-0.9339, -0.8467,  0.0600,  ...,  0.0406,  0.3608,  0.3418]]),\n",
       " torch.Size([24844, 1024]))"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "with torch.no_grad():\n",
    "    embeddings = model.embed(adata)\n",
    "embeddings, embeddings.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. infer the potential gene expressions at certain locations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([[0.0000, 0.0000, 2.8781,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         [0.0000, 0.0000, 2.9699,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         ...,\n",
       "         [0.0000, 0.0000, 3.2570,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]]),\n",
       " torch.Size([10, 18289]))"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we here randomly generate the locations for the cells just for demonstration\n",
    "target_locs = np.random.rand(10, 2)\n",
    "x_min, x_max = adata.obsm['spatial'][:, 0].min(), adata.obsm['spatial'][:, 0].max()\n",
    "y_min, y_max = adata.obsm['spatial'][:, 1].min(), adata.obsm['spatial'][:, 1].max()\n",
    "target_locs[:, 0] = target_locs[:, 0] * (x_max - x_min) + x_min\n",
    "target_locs[:, 1] = target_locs[:, 1] * (y_max - y_min) + y_min\n",
    "\n",
    "with torch.no_grad():\n",
    "    expressions = model.predict_cells_at_locations(adata, target_locs)\n",
    "expressions, expressions.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([[0.0000, 0.0000, 0.0002,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         [0.0000, 0.0000, 0.0002,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         ...,\n",
       "         [0.0000, 0.0000, 0.0003,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],\n",
       "         [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000]]),\n",
       " torch.Size([10, 18289]))"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# you can convert it into normalize counts\n",
    "counts_normalized = np.exp(expressions) - 1\n",
    "counts_normalized = counts_normalized / counts_normalized.sum(axis=1, keepdims=True)\n",
    "counts_normalized, counts_normalized.shape"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}