Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
stanford-crfm
/
pubmed_gpt_tokenizer
like
1
Model card
Files
Files and versions
Community
main
pubmed_gpt_tokenizer
/
tokenizer.json
Commit History
revert to 28k vocab
3d6a170
J38
commited on
Oct 31, 2022
update to new 43k tokenizer for experiment
b101750
J38
commited on
Oct 29, 2022
fix id issue
ae895c0
J38
commited on
Oct 24, 2022
fix main tokenizer file
9541fdc
J38
commited on
Oct 23, 2022
add end of text token
183fbc8
J38
commited on
Sep 16, 2022
28k vocab size, prefix_space=False, truecase
c8684a4
J38
commited on
Sep 16, 2022
50k vocab, prefix_space=false,trained on PubMed Abstracts
39545d2
J38
commited on
Sep 15, 2022
experiment with 50k vocab
9d29e9b
J38
commited on
Sep 14, 2022
add of |endoftext|
19ffb18
J38
commited on
Sep 9, 2022
add of |endoftext|
7a0b15c
J38
commited on
Sep 9, 2022
add lowercase normalizer
f344ee9
J38
commited on
Sep 5, 2022
tokenizer model
db29f29
J38
commited on
Sep 5, 2022