File size: 5,103 Bytes
1f48e4e
 
5060c28
 
 
 
 
1f48e4e
5060c28
 
54a53da
5060c28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3b50ae
5060c28
b3b50ae
5060c28
 
 
 
 
 
 
 
 
 
 
 
5931641
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5060c28
 
 
 
 
54a53da
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: apache-2.0
tags:
- chemistry
- biology
- molecule
- instructions
---


This repo contains a low-rank adapter for [LLaMA2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat), trained on the 🔬 **molecule-oriented instructions** from the 🧪 [Mol-Instructions](https://huggingface.co/datasets/zjunlp/Mol-Instructions) dataset.


Instructions for running it can be found at https://github.com/zjunlp/Mol-Instructions.

> Please refer to our [paper](https://arxiv.org/abs/2306.08018) for more details.


![image.png](logo.png)

<h3> 🔬 Tasks</h3>


<details>
  <summary><b>Molecule description generation</b></summary>
  
- *Please give me some details about this molecule:*
 [C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][=Branch1][C][=O][O][C@H1][Branch2][Ring1][=Branch1][C][O][C][=Branch1][C][=O][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][C][O][P][=Branch1][C][=O][Branch1][C][O][O][C][C@@H1][Branch1][=Branch1][C][=Branch1][C][=O][O][N]

  ```
  The molecule is a 3-sn-phosphatidyl-L-serine in which the phosphatidyl acyl groups at positions 1 and 2 are specified as stearoyl and arachidonoyl respectively. 
  It is functionally related to an arachidonic acid and an octadecanoic acid.
  ```
</details>

<details>
  <summary><b>Description-guided molecule design</b></summary>
  
- *Create a molecule with the structure as the one described:*
  The molecule is a primary arylamine in which an amino functional group is substituted for one of the benzene hydrogens. It is a primary arylamine and a member of anilines.
  
  ```
  [N][C][=C][C][=C][C][=C][Ring1][=Branch1]
  ```
</details>

<details>
  <summary><b>Forward reaction prediction</b></summary>
  
- *With the provided reactants and reagents, propose a potential product:*
  [O][=N+1][Branch1][C][O-1][C][=C][N][=C][Branch1][C][Cl][C][Branch1][C][I][=C][Ring1][Branch2].[Fe]
  
   ```
  [N][C][=C][N][=C][Branch1][C][Cl][C][Branch1][C][I][=C][Ring1][Branch2]
  ```
</details>

<details>
  <summary><b>Retrosynthesis</b></summary>
  
- *Please suggest potential reactants used in the synthesis of the provided product:*
  [C][=C][C][C][N][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C]
  
  ```
  [C][=C][C][C][N].[C][C][Branch1][C][C][Branch1][C][C][O][C][=Branch1][C][=O][O][C][=Branch1][C][=O][O][C][Branch1][C][C][Branch1][C][C][C]
  ```
</details>


<details>
  <summary><b>Reagent prediction</b></summary>
  
- *Please provide possible reagents based on the following chemical reaction:*
  [C][C][=C][C][=C][Branch1][C][N][C][=N][Ring1][#Branch1].[O][=C][Branch1][C][Cl][C][Cl]>>[C][C][=C][C][=C][Branch1][Branch2][N][C][=Branch1][C][=O][C][Cl][C][=N][Ring1][O]

  ```
  [C][C][C][O][C][Ring1][Branch1].[C][C][N][Branch1][Ring1][C][C][C][C].[O]
  ```
</details>

<details>
  <summary><b>Property prediction</b></summary>
  
- *Please provide the HOMO energy value for this molecule:*
  [C][C][O][C][C][Branch1][C][C][C][Branch1][C][C][C]

  ```
  -0.2482
  ```
</details>


<h3> 📝 Demo</h3>

As illustrated in [our repository](https://github.com/zjunlp/Mol-Instructions/tree/main/demo), we provide an example to perform generation.

```shell
>> python generate.py \
    --CLI True \
    --protein False\
    --load_8bit \
    --base_model $BASE_MODEL_PATH \
    --lora_weights $FINETUNED_MODEL_PATH \
```

Please download [Llama-2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat) to obtain the pre-training weights of LlamA-2-7b-chat, refine the `--base_model` to point towards the location where the model weights are saved.

For model fine-tuned on **molecule-oriented** instructions, set `$FINETUNED_MODEL_PATH` to `'zjunlp/llama2-molinst-molecule-7b'`.




<h3> 🚨 Limitations</h3>

The current state of the model, obtained via instruction tuning, is a preliminary demonstration. Its capacity to handle real-world, production-grade tasks remains limited. 

<h3> 📚 References</h3>
If you use our repository, please cite the following related paper:

```
@inproceedings{fang2023mol,
  author       = {Yin Fang and
                  Xiaozhuan Liang and
                  Ningyu Zhang and
                  Kangwei Liu and
                  Rui Huang and
                  Zhuo Chen and
                  Xiaohui Fan and
                  Huajun Chen},
  title        = {Mol-Instructions: {A} Large-Scale Biomolecular Instruction Dataset
                  for Large Language Models},
  booktitle    = {{ICLR}},
  publisher    = {OpenReview.net},
  year         = {2024},
  url          = {https://openreview.net/pdf?id=Tlsdsb6l9n}
}
```
            
<h3> 🫱🏻‍🫲🏾 Acknowledgements</h3>

We appreciate [LLaMA-2](https://ai.meta.com/llama), [LLaMA](https://github.com/facebookresearch/llama), [Huggingface Transformers Llama](https://github.com/huggingface/transformers/tree/main/src/transformers/models/llama), [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html), [Alpaca-LoRA](https://github.com/tloen/alpaca-lora), [Chatbot Service](https://github.com/deep-diver/LLM-As-Chatbot) and many other related works for their open-source contributions.