Update README.md
Browse files
README.md
CHANGED
@@ -14,8 +14,15 @@ pipeline_tag: conversational
|
|
14 |
---
|
15 |
# Model description
|
16 |
[AI Sweden](https://huggingface.co/AI-Sweden-Models/)
|
17 |
-
|
18 |
-
[GPT-Sw3 126M
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
GPT-SW3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. GPT-SW3 has been trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.
|
21 |
|
@@ -159,7 +166,7 @@ Following Mitchell et al. (2018), we provide a model card for GPT-SW3.
|
|
159 |
|
160 |
- Books
|
161 |
- Litteraturbanken (https://litteraturbanken.se/)
|
162 |
-
- The Pile
|
163 |
|
164 |
- Articles
|
165 |
- Diva (https://www.diva-portal.org/)
|
|
|
14 |
---
|
15 |
# Model description
|
16 |
[AI Sweden](https://huggingface.co/AI-Sweden-Models/)
|
17 |
+
**Base models**
|
18 |
+
[GPT-Sw3 126M](https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m/) | [GPT-Sw3 356M](https://huggingface.co/AI-Sweden-Models/gpt-sw3-356m/) | [GPT-Sw3 1.3B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b/)
|
19 |
+
[GPT-Sw3 6.7B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b/) | [GPT-Sw3 6.7B v2](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b-v2/) | [GPT-Sw3 20B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-20b/)
|
20 |
+
[GPT-Sw3 40B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-40b/)
|
21 |
+
**Instruct models**
|
22 |
+
[GPT-Sw3 126M Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m-instruct/) | [GPT-Sw3 356M Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-356m-instruct/) | [GPT-Sw3 1.3B Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b-instruct/)
|
23 |
+
[GPT-Sw3 6.7B v2 Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct/) | [GPT-Sw3 20B Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-20b-instruct/)
|
24 |
+
**Quantized models**
|
25 |
+
[GPT-Sw3 6.7B v2 Instruct 4-bit gptq](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct-4bit-gptq)
|
26 |
|
27 |
GPT-SW3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. GPT-SW3 has been trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.
|
28 |
|
|
|
166 |
|
167 |
- Books
|
168 |
- Litteraturbanken (https://litteraturbanken.se/)
|
169 |
+
- The Pile
|
170 |
|
171 |
- Articles
|
172 |
- Diva (https://www.diva-portal.org/)
|