File size: 245 Bytes
9849233
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: mit
tags:
  - biology
  - genomics
  - dna
---

# Tokenizer for causal language modeling of DNA sequences

```json
    "vocab": {
      "[PAD]": 0,
      "[UNK]": 1,
      "a": 2,
      "c": 3,
      "g": 4,
      "t": 5,
    },
```