alea-institute commited on
Commit
54bcfca
·
verified ·
1 Parent(s): 6232bc0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -1,11 +1,6 @@
1
  ---
2
- library_name: transformers
3
- tags:
4
- - kl3m
5
- - kl3m-003
6
- - alea
7
- - legal
8
- - financial
9
  date: 2023-12-28
10
  ---
11
 
@@ -16,6 +11,8 @@ The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B of
16
  This tokenizer was used for the first generation of KL3M embedding and generative models, including
17
  `kl3m-170M`, `kl3m-1.7B`, `kl3m-embedding-001`, and `kl3m-embedding-002`.
18
 
 
 
19
  ## Model Details
20
 
21
 
@@ -141,3 +138,5 @@ Tokenizer and dataset publications are pending.
141
 
142
  For any questions, please contact [ALEA Institute](https://aleainstitute.ai) at [hello@aleainstitute.ai](mailto:hello@aleainstitute.ai) or
143
  create an issue on this repository or [GitHub](https://github.com/alea-institute/kl3m-embedding-research).
 
 
 
1
  ---
2
+ library_name: tokenizers
3
+ tags: ['kl3m', 'kl3m-001', 'alea', 'legal', 'financial']
 
 
 
 
 
4
  date: 2023-12-28
5
  ---
6
 
 
11
  This tokenizer was used for the first generation of KL3M embedding and generative models, including
12
  `kl3m-170M`, `kl3m-1.7B`, `kl3m-embedding-001`, and `kl3m-embedding-002`.
13
 
14
+ Please see `kl3m-003-64k` for the next iteration of our research on domain-specific tokenization.
15
+
16
  ## Model Details
17
 
18
 
 
138
 
139
  For any questions, please contact [ALEA Institute](https://aleainstitute.ai) at [hello@aleainstitute.ai](mailto:hello@aleainstitute.ai) or
140
  create an issue on this repository or [GitHub](https://github.com/alea-institute/kl3m-embedding-research).
141
+
142
+ ![logo](https://aleainstitute.ai/images/alea-logo-ascii-1x1.png)