--- license: mit tags: - biology - genomics - dna --- # Tokenizer for causal language modeling of DNA sequences ```json "vocab": { "[PAD]": 0, "[UNK]": 1, "a": 2, "c": 3, "g": 4, "t": 5, }, ```