jsunn-y commited on
Commit
dde65c9
·
verified ·
1 Parent(s): c4f99ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -5,7 +5,27 @@ license: bsd-3-clause
5
  # ProCALM
6
  [ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
7
 
8
- ProCALM models share `tokenizer.json` and individual models are organized into subfolders. We have uploaded the most relevant models here, but please reach out if you would like to use other models from our paper. `1.5B` and `9B` refer to checkpoints trained to 1.5 and 9 billion tokens, respectively. More usage details can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) and in our paper.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  | Name | Description |
11
  |:--------|:-------:|
 
5
  # ProCALM
6
  [ProCALM](https://github.com/jsunn-y/ProCALM/tree/main) (Protein Conditionally Adapted Language Model) is a suite of models where [ProGen2-base](https://github.com/enijkamp/progen2) is finetuned with conditional adapters for conditional generation of functional enzymes, based on EC number, taxonomy, or both.
7
 
8
+ ProCALM models share `tokenizer.json` and individual models are organized into subfolders. We have uploaded the most relevant models here, but please reach out if you would like to use other models from our paper. `1.5B` and `9B` refer to checkpoints trained to 1.5 and 9 billion tokens, respectively.
9
+
10
+ ## Quickstart
11
+ Usage details with examples can be found in [github](https://github.com/jsunn-y/ProCALM/tree/main) under "Generation" and in our paper. Example framework for generation from pretrained models:
12
+ ```
13
+ from tokenizers import Tokenizer
14
+ from model import ProgenConditional
15
+
16
+ model = ProgenConditional.from_pretrained("jsunn-y/ProCALM", subfolder="ec-onehot-swissprot/1.5B")
17
+ tokenizer = Tokenizer.from_pretrained("jsunn-y/ProCALM")
18
+
19
+ with torch.no_grad():
20
+ input_ids = torch.tensor(self.tokenizer.encode(context).ids).view([1, -1]).to(self.device)
21
+ tokens_batch = model.generate(input_ids=input_ids, condition_encodings=condition_encodings, do_sample=True, temperature=temperature, max_length=max_length, top_p=top_p, num_return_sequences=num_return_sequences, pad_token_id=self.pad_token_id, eos_token_id=4)
22
+
23
+ as_lists = lambda batch: [batch[i, ...].detach().cpu().numpy().tolist() for i in range(batch.shape[0])]
24
+ sequences = tokenizer.decode_batch(as_lists(tokens_batch))
25
+ ```
26
+ Note that condition_encodings is a representation of the conditioning, which can be calculated using the dictionaries `.pt` provided in our github under `data`.
27
+
28
+ ## Summary of Available Models
29
 
30
  | Name | Description |
31
  |:--------|:-------:|