pubmed_gpt_tokenizer / merges.txt

Commit History

revert to 28k vocab
3d6a170

J38 commited on

update to new 43k tokenizer for experiment
b101750

J38 commited on

28k vocab size, prefix_space=False, truecase
c8684a4

J38 commited on

50k vocab, prefix_space=false,trained on PubMed Abstracts
39545d2

J38 commited on

experiment with 50k vocab
9d29e9b

J38 commited on

add merges.txt
c19b598

J38 commited on