File size: 4,850 Bytes
1f48e4e 5060c28 1f48e4e 5060c28 54a53da 5060c28 b3b50ae 5060c28 b3b50ae 5060c28 54a53da |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: apache-2.0
tags:
- chemistry
- biology
- molecule
- instructions
---
This repo contains a low-rank adapter for [LLaMA2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat), trained on the 🔬 **molecule-oriented instructions** from the 🧪 [Mol-Instructions](https://huggingface.co/datasets/zjunlp/Mol-Instructions) dataset.
Instructions for running it can be found at https://github.com/zjunlp/Mol-Instructions.
> Please refer to our [paper](https://arxiv.org/abs/2306.08018) for more details.
![image.png](logo.png)
<h3> 🔬 Tasks</h3>
<details>
<summary><b>Molecule description generation</b></summary>
- *Please give me some details about this molecule:*
[C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=Branch1][C][=O][O][C@H1][Branch2][Ring1][=Branch1][C][O][C][=Branch1][C][=O][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][O][P][=Branch1][C][=O][Branch1][C][O][O][C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][N]
```
The molecule is a 3-sn-phosphatidyl-L-serine in which the phosphatidyl acyl groups at positions 1 and 2 are specified as stearoyl and arachidonoyl respectively.
It is functionally related to an arachidonic acid and an octadecanoic acid.
```
</details>
<details>
<summary><b>Description-guided molecule design</b></summary>
- *Create a molecule with the structure as the one described:*
The molecule is a primary arylamine in which an amino functional group is substituted for one of the benzene hydrogens. It is a primary arylamine and a member of anilines.
```
[N][C][=C][C][=C][C][=C][Ring1][=Branch1]
```
</details>
<details>
<summary><b>Forward reaction prediction</b></summary>
- *With the provided reactants and reagents, propose a potential product:*
[O][=N+1][Branch1][C][O-1][C][=C][N][=C][Branch1][C][Cl][C][Branch1][C][I][=C][Ring1][Branch2].[Fe]
```
[N][C][=C][N][=C][Branch1][C][Cl][C][Branch1][C][I][=C][Ring1][Branch2]
```
</details>
<details>
<summary><b>Retrosynthesis</b></summary>
- *Please suggest potential reactants used in the synthesis of the provided product:*
[C][=C][C][C][N][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C]
```
[C][=C][C][C][N].[C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][O][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C]
```
</details>
<details>
<summary><b>Reagent prediction</b></summary>
- *Please provide possible reagents based on the following chemical reaction:*
[C][C][=C][C][=C][Branch1][C][N][C][=N][Ring1][#Branch1].[O][=C][Branch1][C][Cl][C][Cl]>>[C][C][=C][C][=C][Branch1][Branch2][N][C][=Branch1][C][=O][C][Cl][C][=N][Ring1][O]
```
[C][C][C][O][C][Ring1][Branch1].[C][C][N][Branch1][Ring1][C][C][C][C].[O]
```
</details>
<details>
<summary><b>Property prediction</b></summary>
- *Please provide the HOMO energy value for this molecule:*
[C][C][O][C][C][Branch1][C][C][C][Branch1][C][C][C]
```
-0.2482
```
</details>
<h3> 📝 Demo</h3>
As illustrated in [our repository](https://github.com/zjunlp/Mol-Instructions/tree/main/demo), we provide an example to perform generation.
```shell
>> python generate.py \
--CLI True \
--protein False\
--load_8bit \
--base_model $BASE_MODEL_PATH \
--lora_weights $FINETUNED_MODEL_PATH \
```
Please download [Llama-2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat) to obtain the pre-training weights of LlamA-2-7b-chat, refine the `--base_model` to point towards the location where the model weights are saved.
For model fine-tuned on **molecule-oriented** instructions, set `$FINETUNED_MODEL_PATH` to `'zjunlp/llama2-molinst-molecule-7b'`.
<h3> 🚨 Limitations</h3>
The current state of the model, obtained via instruction tuning, is a preliminary demonstration. Its capacity to handle real-world, production-grade tasks remains limited.
<h3> 📚 References</h3>
If you use our repository, please cite the following related paper:
```
@article{molinst,
title={Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models},
author={Fang, Yin and Liang, Xiaozhuan and Zhang, Ningyu and Liu, Kangwei and Huang, Rui and Chen, Zhuo and Fan, Xiaohui and Chen, Huajun},
journal={arXiv preprint arXiv:2306.08018},
year={2023}
}
```
<h3> 🫱🏻🫲🏾 Acknowledgements</h3>
We appreciate [LLaMA-2](https://ai.meta.com/llama), [LLaMA](https://github.com/facebookresearch/llama), [Huggingface Transformers Llama](https://github.com/huggingface/transformers/tree/main/src/transformers/models/llama), [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html), [Alpaca-LoRA](https://github.com/tloen/alpaca-lora), [Chatbot Service](https://github.com/deep-diver/LLM-As-Chatbot) and many other related works for their open-source contributions. |