--- license: mit tags: - dna - biology - genomics --- # Tokenizer for masked language modeling of DNA sequences ```json "vocab": { "[PAD]": 0, "[MASK]": 1, "[UNK]": 2, "a": 3, "c": 4, "g": 5, "t": 6 }, ```