pubmed_gpt_tokenizer / tokenizer.json

Commit History

revert to 28k vocab
3d6a170

J38 commited on

update to new 43k tokenizer for experiment
b101750

J38 commited on

fix id issue
ae895c0

J38 commited on

fix main tokenizer file
9541fdc

J38 commited on

add end of text token
183fbc8

J38 commited on

28k vocab size, prefix_space=False, truecase
c8684a4

J38 commited on

50k vocab, prefix_space=false,trained on PubMed Abstracts
39545d2

J38 commited on

experiment with 50k vocab
9d29e9b

J38 commited on

add of |endoftext|
19ffb18

J38 commited on

add of |endoftext|
7a0b15c

J38 commited on

add lowercase normalizer
f344ee9

J38 commited on

tokenizer model
db29f29

J38 commited on