Commit History

revert to 28k vocab
3d6a170

J38 commited on

update to new 43k tokenizer for experiment
b101750

J38 commited on

fix id issue
ae895c0

J38 commited on

fix main tokenizer file
9541fdc

J38 commited on

add end of text token
183fbc8

J38 commited on

28k vocab size, prefix_space=False, truecase
c8684a4

J38 commited on

50k vocab, prefix_space=false,trained on PubMed Abstracts
39545d2

J38 commited on

experiment with 50k vocab
9d29e9b

J38 commited on

add of |endoftext|
19ffb18

J38 commited on

add of |endoftext|
7a0b15c

J38 commited on

use lowercase normalizer
514166f

J38 commited on

does bert normalizer work
7fd39d4

J38 commited on

change lowercase key
5940b13

J38 commited on

add normalizer
15bb604

J38 commited on

add lowercase normalizer
f344ee9

J38 commited on

add merges.txt
c19b598

J38 commited on

add vocab.json
5241c39

J38 commited on

add vocab file
cab4e59

J38 commited on

do lower case
4bea54c

J38 commited on

config for tokenizer
c682350

J38 commited on

tokenizer model
db29f29

J38 commited on

initial commit
c2f9a73

J38 commited on