alea-institute
/

kl3m-001-32k

@@ -1,17 +1,12 @@
 ---
 library_name: tokenizers
-tags:
-- kl3m
-- kl3m-001
-- alea
-- legal
-- financial
 date: 2023-12-28
 ---
 # kl3m-001-32k tokenizer
-The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B of financial and legal text from primarily-English sources.
 This tokenizer was used for the first generation of KL3M embedding and generative models, including
 `kl3m-170M`, `kl3m-1.7B`, `kl3m-embedding-001`, and `kl3m-embedding-002`.
@@ -33,7 +28,7 @@ Please see `kl3m-003-64k` for the next iteration of our research on domain-speci
 ### Model Description
-The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B of financial and legal text from primarily-English sources.
 This tokenizer is notable for a number of reasons:

 ---
 library_name: tokenizers
+tags: ['kl3m', 'kl3m-001', 'alea', 'legal', 'financial']
 date: 2023-12-28
 ---
 # kl3m-001-32k tokenizer
+The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B tokens of financial and legal text from primarily-English sources.
 This tokenizer was used for the first generation of KL3M embedding and generative models, including
 `kl3m-170M`, `kl3m-1.7B`, `kl3m-embedding-001`, and `kl3m-embedding-002`.
 ### Model Description
+The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B tokens of financial and legal text from primarily-English sources.
 This tokenizer is notable for a number of reasons: