Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
stanford-crfm
/
pubmed_gpt_tokenizer
like
1
Model card
Files
Files and versions
Community
main
pubmed_gpt_tokenizer
Commit History
revert to 28k vocab
3d6a170
J38
commited on
Oct 31, 2022
update to new 43k tokenizer for experiment
b101750
J38
commited on
Oct 29, 2022
fix id issue
ae895c0
J38
commited on
Oct 24, 2022
fix main tokenizer file
9541fdc
J38
commited on
Oct 23, 2022
add end of text token
183fbc8
J38
commited on
Sep 16, 2022
28k vocab size, prefix_space=False, truecase
c8684a4
J38
commited on
Sep 16, 2022
50k vocab, prefix_space=false,trained on PubMed Abstracts
39545d2
J38
commited on
Sep 15, 2022
experiment with 50k vocab
9d29e9b
J38
commited on
Sep 14, 2022
add of |endoftext|
19ffb18
J38
commited on
Sep 9, 2022
add of |endoftext|
7a0b15c
J38
commited on
Sep 9, 2022
use lowercase normalizer
514166f
J38
commited on
Sep 5, 2022
does bert normalizer work
7fd39d4
J38
commited on
Sep 5, 2022
change lowercase key
5940b13
J38
commited on
Sep 5, 2022
add normalizer
15bb604
J38
commited on
Sep 5, 2022
add lowercase normalizer
f344ee9
J38
commited on
Sep 5, 2022
add merges.txt
c19b598
J38
commited on
Sep 5, 2022
add vocab.json
5241c39
J38
commited on
Sep 5, 2022
add vocab file
cab4e59
J38
commited on
Sep 5, 2022
do lower case
4bea54c
J38
commited on
Sep 5, 2022
config for tokenizer
c682350
J38
commited on
Sep 5, 2022
tokenizer model
db29f29
J38
commited on
Sep 5, 2022
initial commit
c2f9a73
J38
commited on
Sep 5, 2022