alea-institute commited on
Commit
f0f9007
·
verified ·
1 Parent(s): 0aa2f6f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -8
README.md CHANGED
@@ -1,17 +1,12 @@
1
  ---
2
  library_name: tokenizers
3
- tags:
4
- - kl3m
5
- - kl3m-001
6
- - alea
7
- - legal
8
- - financial
9
  date: 2023-12-28
10
  ---
11
 
12
  # kl3m-001-32k tokenizer
13
 
14
- The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B of financial and legal text from primarily-English sources.
15
 
16
  This tokenizer was used for the first generation of KL3M embedding and generative models, including
17
  `kl3m-170M`, `kl3m-1.7B`, `kl3m-embedding-001`, and `kl3m-embedding-002`.
@@ -33,7 +28,7 @@ Please see `kl3m-003-64k` for the next iteration of our research on domain-speci
33
 
34
  ### Model Description
35
 
36
- The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B of financial and legal text from primarily-English sources.
37
 
38
  This tokenizer is notable for a number of reasons:
39
 
 
1
  ---
2
  library_name: tokenizers
3
+ tags: ['kl3m', 'kl3m-001', 'alea', 'legal', 'financial']
 
 
 
 
 
4
  date: 2023-12-28
5
  ---
6
 
7
  # kl3m-001-32k tokenizer
8
 
9
+ The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B tokens of financial and legal text from primarily-English sources.
10
 
11
  This tokenizer was used for the first generation of KL3M embedding and generative models, including
12
  `kl3m-170M`, `kl3m-1.7B`, `kl3m-embedding-001`, and `kl3m-embedding-002`.
 
28
 
29
  ### Model Description
30
 
31
+ The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B tokens of financial and legal text from primarily-English sources.
32
 
33
  This tokenizer is notable for a number of reasons:
34