alea-institute
commited on
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -1,11 +1,6 @@
|
|
1 |
---
|
2 |
-
library_name:
|
3 |
-
tags:
|
4 |
-
- kl3m
|
5 |
-
- kl3m-003
|
6 |
-
- alea
|
7 |
-
- legal
|
8 |
-
- financial
|
9 |
date: 2023-12-28
|
10 |
---
|
11 |
|
@@ -16,6 +11,8 @@ The `kl3m-001-32k` tokenizer is a domain-specific tokenizer trained on ~500B of
|
|
16 |
This tokenizer was used for the first generation of KL3M embedding and generative models, including
|
17 |
`kl3m-170M`, `kl3m-1.7B`, `kl3m-embedding-001`, and `kl3m-embedding-002`.
|
18 |
|
|
|
|
|
19 |
## Model Details
|
20 |
|
21 |
|
@@ -141,3 +138,5 @@ Tokenizer and dataset publications are pending.
|
|
141 |
|
142 |
For any questions, please contact [ALEA Institute](https://aleainstitute.ai) at [hello@aleainstitute.ai](mailto:hello@aleainstitute.ai) or
|
143 |
create an issue on this repository or [GitHub](https://github.com/alea-institute/kl3m-embedding-research).
|
|
|
|
|
|
1 |
---
|
2 |
+
library_name: tokenizers
|
3 |
+
tags: ['kl3m', 'kl3m-001', 'alea', 'legal', 'financial']
|
|
|
|
|
|
|
|
|
|
|
4 |
date: 2023-12-28
|
5 |
---
|
6 |
|
|
|
11 |
This tokenizer was used for the first generation of KL3M embedding and generative models, including
|
12 |
`kl3m-170M`, `kl3m-1.7B`, `kl3m-embedding-001`, and `kl3m-embedding-002`.
|
13 |
|
14 |
+
Please see `kl3m-003-64k` for the next iteration of our research on domain-specific tokenization.
|
15 |
+
|
16 |
## Model Details
|
17 |
|
18 |
|
|
|
138 |
|
139 |
For any questions, please contact [ALEA Institute](https://aleainstitute.ai) at [hello@aleainstitute.ai](mailto:hello@aleainstitute.ai) or
|
140 |
create an issue on this repository or [GitHub](https://github.com/alea-institute/kl3m-embedding-research).
|
141 |
+
|
142 |
+
![logo](https://aleainstitute.ai/images/alea-logo-ascii-1x1.png)
|