ZJU-Fangyin
commited on
Commit
•
836a12a
1
Parent(s):
0fc24a7
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
pipeline_tag: text-generation
|
4 |
+
tags:
|
5 |
+
- biology
|
6 |
+
- single-cell
|
7 |
+
- single-cell analysis
|
8 |
+
- text-generation-inference
|
9 |
+
---
|
10 |
+
|
11 |
+
|
12 |
+
|
13 |
+
|
14 |
+
<div align="center">
|
15 |
+
|
16 |
+
[![Code License](https://img.shields.io/badge/Code%20License-MIT-green.svg)](https://github.com/zjunlp/ChatCell/blob/main/LICENSE)
|
17 |
+
[![Data License](https://img.shields.io/badge/Data%20License-CC%20BY%204.0-red.svg)](https://github.com/zjunlp/ChatCell/blob/main/DATA_LICENSE)
|
18 |
+
|
19 |
+
![image.png](./figure/logo.png)
|
20 |
+
|
21 |
+
<h2 align="center"> <img src="figure/logo.png" width="8%" height="18%"> ChatCell: Facilitating Single-Cell Analysis with Natural Language </h2>
|
22 |
+
|
23 |
+
<p align="center">
|
24 |
+
<a href="https://www.zjukg.org/project/ChatCell">💻 Project Page</a> •
|
25 |
+
<a href="https://huggingface.co/datasets/zjunlp/Single-cell-Instructions">🤗 Dataset</a> •
|
26 |
+
<a href="https://huggingface.co/spaces/zjunlp/Chatcell">🍎 Demo</a> •
|
27 |
+
<a href="#1">🏖️ Overview</a> •
|
28 |
+
<a href="#2">🧬 Single-cell Analysis Tasks</a> •
|
29 |
+
<a href="#3">🛠️ Quickstart</a> •
|
30 |
+
<a href="#4">📝 Cite</a>
|
31 |
+
</p>
|
32 |
+
|
33 |
+
|
34 |
+
|
35 |
+
<div align=center><img src="figure/intro.gif" width="60%" height="100%" /></div>
|
36 |
+
<b>ChatCell</b> allows researchers to input instructions in either natural or single-cell language, thereby facilitating the execution of necessary tasks in single-cell analysis. Black and red texts denote human and single-cell language, respectively.
|
37 |
+
|
38 |
+
</div>
|
39 |
+
|
40 |
+
## 🆕 News
|
41 |
+
|
42 |
+
- **\[Feb 2024\]** We released the model weights and datasets.
|
43 |
+
|
44 |
+
|
45 |
+
## 📌 Table of Contents
|
46 |
+
|
47 |
+
- [🏖️ Overview](#1)
|
48 |
+
- [🧬 Single-cell Analysis Tasks](#2)
|
49 |
+
- [🛠️ Quickstart](#3)
|
50 |
+
- [📝 Cite](#4)
|
51 |
+
|
52 |
+
|
53 |
+
---
|
54 |
+
|
55 |
+
<h2 id="1">🏖️ Overview</h2>
|
56 |
+
|
57 |
+
**Background**
|
58 |
+
- Single-cell biology examines the intricate functions of the cells, ranging from energy production to genetic information transfer, playing a critical role in unraveling the fundamental principles of life and mechanisms influencing health and disease.
|
59 |
+
- The field has witnessed a surge in single-cell RNA sequencing (scRNA-seq) data, driven by advancements in high-throughput sequencing and reduced costs.
|
60 |
+
- Traditional single-cell foundation models leverage extensive scRNA-seq datasets, applying NLP techniques to analyze gene expression matrices—structured formats that simplify scRNA-seq data into computationally tractable representations—during pre-training. They are subsequently fine-tuned for distinct single-cell analysis tasks, as shown in Figure (a).
|
61 |
+
|
62 |
+
<p align="center">
|
63 |
+
<img src="figure/overview.jpg" width="100%" height="60%">
|
64 |
+
</p>
|
65 |
+
<div align="center">
|
66 |
+
Figure 1: (a) Comparison of traditional single-cell engineering and <b>ChatCell</b>. (b) Overview of <b>ChatCell</b>.
|
67 |
+
</div>
|
68 |
+
<br>
|
69 |
+
We present <b>ChatCell</b>, a new paradigm that leverages natural language to make single-cell analysis more accessible and intuitive.
|
70 |
+
|
71 |
+
- Initially, we convert scRNA-seq data into a single-cell language that LLMs can readily interpret.
|
72 |
+
- Subsequently, we employ templates to integrate this single-cell language with task descriptions and target outcomes, creating comprehensive single-cell instructions.
|
73 |
+
- To improve the LLM's expertise in the single-cell domain, we conduct vocabulary adaptation, enriching the model with a specialized single-cell lexicon.
|
74 |
+
- Following this, we utilize unified sequence generation to empower the model to adeptly execute a range of single-cell tasks.
|
75 |
+
|
76 |
+
|
77 |
+
<h2 id="2">🧬 Single-cell Analysis Tasks</h2>
|
78 |
+
|
79 |
+
We concentrate on the following single-cell tasks:
|
80 |
+
|
81 |
+
- <b>Random Cell Sentence Generation.</b>
|
82 |
+
Random cell sentence generation challenges the model to create cell sentences devoid of predefined biological conditions or constraints. This task aims to evaluate the model's ability to generate valid and contextually appropriate cell sentences, potentially simulating natural variations in cellular behavior.
|
83 |
+
|
84 |
+
<p align="center">
|
85 |
+
<img src="figure/example1.jpg" width="80%" height="60%">
|
86 |
+
</p>
|
87 |
+
|
88 |
+
|
89 |
+
- <b>Pseudo-cell Generation.</b>
|
90 |
+
Pseudo-cell generation focuses on generating gene sequences tailored to specific cell type labels. This task is vital for unraveling gene expression and regulation across different cell types, offering insights for medical research and disease studies, particularly in the context of diseased cell types.
|
91 |
+
|
92 |
+
|
93 |
+
<p align="center">
|
94 |
+
<img src="figure/example2.jpg" width="80%" height="60%">
|
95 |
+
</p>
|
96 |
+
|
97 |
+
- <b>Cell Type Annotation.</b>
|
98 |
+
For cell type annotation, the model is tasked with precisely classifying cells into their respective types based on gene expression patterns encapsulated in cell sentences. This task is fundamental for understanding cellular functions and interactions within tissues and organs, playing a crucial role in developmental biology and regenerative medicine.
|
99 |
+
|
100 |
+
<p align="center">
|
101 |
+
<img src="figure/example3.jpg" width="80%" height="60%">
|
102 |
+
</p>
|
103 |
+
|
104 |
+
- <b>Drug Sensitivity Prediction.</b>
|
105 |
+
The drug sensitivity prediction task aims to predict the response of different cells to various drugs. It is pivotal in designing effective, personalized treatment plans and contributes significantly to drug development, especially in optimizing drug efficacy and safety.
|
106 |
+
|
107 |
+
|
108 |
+
<p align="center">
|
109 |
+
<img src="figure/example4.jpg" width="80%" height="60%">
|
110 |
+
</p>
|
111 |
+
|
112 |
+
<h2 id="3">🛠️ Quickstart</h2>
|
113 |
+
|
114 |
+
- **📚 Prepare the data**
|
115 |
+
|
116 |
+
**Step1:**
|
117 |
+
|
118 |
+
**Step2:**
|
119 |
+
|
120 |
+
- **🔨 Train**
|
121 |
+
|
122 |
+
- **⌨️ Generate**
|
123 |
+
|
124 |
+
- **🔍 Evaluate**
|
125 |
+
|
126 |
+
|
127 |
+
|
128 |
+
<h2 id="4">📝 Cite</h2>
|
129 |
+
|
130 |
+
If you use our repository, please cite the following related paper:
|
131 |
+
```
|
132 |
+
@article{fang2024chatcell,
|
133 |
+
title={ChatCell: Facilitating Single-Cell Analysis with Natural Language},
|
134 |
+
author={Fang, Yin and Liu, Kangwei and Zhang, Ningyu and Deng, Xinle and Yang, Penghui and Chen, Zhuo and Tang, Xiangru and Gerstein, Mark and Fan, Xiaohui and Chen, Huajun},
|
135 |
+
journal={arXiv preprint arXiv:2306.08018},
|
136 |
+
year={2024},
|
137 |
+
}
|
138 |
+
```
|