File size: 245 Bytes
9849233 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
---
license: mit
tags:
- biology
- genomics
- dna
---
# Tokenizer for causal language modeling of DNA sequences
```json
"vocab": {
"[PAD]": 0,
"[UNK]": 1,
"a": 2,
"c": 3,
"g": 4,
"t": 5,
},
``` |