zjunlp
/

chatcell-large

@@ -10,6 +10,7 @@ tags:
 <div align="center">
 <img src="./figures/logo.png" alt="image" width=8%>
@@ -17,10 +18,9 @@ tags:
 <h2 align="center"> ChatCell: Facilitating Single-Cell Analysis with Natural Language </h2>
 <p align="center">
-  <a href="https://www.zjukg.org/project/ChatCell">💻 Project Page</a> •
   <a href="https://huggingface.co/datasets/zjunlp/ChatCell-Instructions">🤗 Dataset</a> •
   <a href="https://huggingface.co/spaces/zjunlp/Chatcell">🍎 Demo</a> •
-  <a href="https://arxiv.org/abs/2402.08303">📑 Paper</a> •
   <a href="#1">🏖️ Overview</a> •
   <a href="#2">🧬 Single-cell Analysis Tasks</a> •
   <a href="#3">🛠️ Quickstart</a> •
@@ -36,37 +36,32 @@ tags:
 ## 📌 Table of Contents
-- [🏖️ Overview](#1)
-- [🧬 Single-cell Analysis Tasks](#2)
-- [🛠️ Quickstart](#3)
-- [📝 Cite](#4)
 ---
-<h2 id="1">🏖️ Overview</h2>
-**Background**
-- Single-cell biology examines the intricate functions of the cells, ranging from energy production to genetic information transfer, playing a critical role in unraveling the fundamental principles of life and mechanisms influencing health and disease.
-- The field has witnessed a surge in single-cell RNA sequencing (scRNA-seq) data, driven by advancements in high-throughput sequencing and reduced costs.
-- Traditional single-cell foundation models leverage extensive scRNA-seq datasets, applying NLP techniques to analyze gene expression matrices—structured formats that simplify scRNA-seq data into computationally tractable representations—during pre-training. They are subsequently fine-tuned for distinct single-cell analysis tasks, as shown in Figure (a).
-<p align="center">
-<img src="./figures/overview.jpg" alt="image" width=100%>
-</p>
-<div align="center">
-Figure 1:  (a) Comparison of traditional single-cell engineering and <b>ChatCell</b>. (b) Overview of <b>ChatCell</b>.
-</div>
-<br>
-We present <b>ChatCell</b>, a new paradigm that leverages natural language to make single-cell analysis more accessible and intuitive.
-- Initially, we convert scRNA-seq data into a single-cell language that LLMs can readily interpret.
-- Subsequently, we employ templates to integrate this single-cell language with task descriptions and target outcomes, creating comprehensive single-cell instructions.
-- To improve the LLM's expertise in the single-cell domain,  we conduct vocabulary adaptation, enriching the model with a specialized single-cell lexicon.
-- Following this, we utilize unified sequence generation to empower the model to adeptly execute a range of single-cell tasks.
-<h2 id="2">🧬 Single-cell Analysis Tasks</h2>
 We concentrate on the following single-cell tasks:
@@ -101,32 +96,18 @@ The drug sensitivity prediction task aims to predict the response of different c
 <img src="./figures/example4.jpg" alt="image" width=80%>
 </p>
-<h2 id="3">🛠️ Quickstart</h2>
-```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-tokenizer = AutoTokenizer.from_pretrained("zjunlp/chatcell-large")
-model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/chatcell-large")
-input_text="Detail the 100 starting genes for a Mix, ranked by expression level: "
-# Encode the input text and generate a response with specified generation parameters
-input_ids = tokenizer(input_text,return_tensors="pt").input_ids
-output_ids = model.generate(input_ids, max_length=512, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, do_sample=True)
-# Decode and print the generated output text
-output_text = tokenizer.decode(output_ids[0],skip_special_tokens=True)
-print(output_text)
-```
-<h2 id="4">📝 Cite</h2>
-If you use our repository, please cite the following related paper:
 ```
 @article{fang2024chatcell,
   title={ChatCell: Facilitating Single-Cell Analysis with Natural Language},
   author={Fang, Yin and Liu, Kangwei and Zhang, Ningyu and Deng, Xinle and Yang, Penghui and Chen, Zhuo and Tang, Xiangru and Gerstein, Mark and Fan, Xiaohui and Chen, Huajun},
-  journal={arXiv preprint arXiv:2402.08303},
   year={2024},
 }
 ```

 <div align="center">
 <img src="./figures/logo.png" alt="image" width=8%>
 <h2 align="center"> ChatCell: Facilitating Single-Cell Analysis with Natural Language </h2>
 <p align="center">
+  <a href="https://chat.openai.com/g/g-vUwj222gQ-chatcell">💻GPTStore App</a> •
   <a href="https://huggingface.co/datasets/zjunlp/ChatCell-Instructions">🤗 Dataset</a> •
   <a href="https://huggingface.co/spaces/zjunlp/Chatcell">🍎 Demo</a> •
   <a href="#1">🏖️ Overview</a> •
   <a href="#2">🧬 Single-cell Analysis Tasks</a> •
   <a href="#3">🛠️ Quickstart</a> •
 ## 📌 Table of Contents
+- [🛠️ Quickstart](#2)
+- [🧬 Single-cell Analysis Tasks](#3)
+- [✨ Acknowledgements](#4)
+- [📝 Cite](#5)
 ---
+<h2 id="3">🛠️ Quickstart</h2>
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+tokenizer = AutoTokenizer.from_pretrained("zjunlp/chatcell-large")
+model = AutoModelForSeq2SeqLM.from_pretrained("zjunlp/chatcell-large")
+input_text="Detail the 100 starting genes for a Mix, ranked by expression level: "
+# Encode the input text and generate a response with specified generation parameters
+input_ids = tokenizer(input_text,return_tensors="pt").input_ids
+output_ids = model.generate(input_ids, max_length=512, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95, do_sample=True)
+# Decode and print the generated output text
+output_text = tokenizer.decode(output_ids[0],skip_special_tokens=True)
+print(output_text)
+```
+<h2 id="3">🧬 Single-cell Analysis Tasks</h2>
 We concentrate on the following single-cell tasks:
 <img src="./figures/example4.jpg" alt="image" width=80%>
 </p>
+<h2 id="4">📝 ✨ Acknowledgements</h2>
+Special thanks to the authors of [Cell2Sentence: Teaching Large Language Models the Language of Biology](https://github.com/vandijklab/cell2sentence-ft) and [Representing cells as sentences enables natural-language processing for single-cell transcriptomics
+](https://github.com/rahuldhodapkar/cell2sentence) for their inspiring work.
+<h2 id="5">📝 Cite</h2>
 ```
 @article{fang2024chatcell,
   title={ChatCell: Facilitating Single-Cell Analysis with Natural Language},
   author={Fang, Yin and Liu, Kangwei and Zhang, Ningyu and Deng, Xinle and Yang, Penghui and Chen, Zhuo and Tang, Xiangru and Gerstein, Mark and Fan, Xiaohui and Chen, Huajun},
   year={2024},
 }
 ```