Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ language:
|
|
9 |
---
|
10 |
# Model description
|
11 |
[AI Sweden](https://huggingface.co/AI-Sweden-Models/)
|
12 |
-
[GPT-Sw3 126M](https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m/) | [GPT-Sw3 356M](https://huggingface.co/AI-Sweden-Models/gpt-sw3-356m/) | [GPT-Sw3 1.3B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b/) | [GPT-Sw3 6.7B
|
13 |
[GPT-Sw3 126M Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m-instruct/) | [GPT-Sw3 356M Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-356m-instruct/) | [GPT-Sw3 1.3B Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b-instruct/) | [GPT-Sw3 6.7B v2 Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct/) | [GPT-Sw3 20B Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-20b-instruct/)
|
14 |
|
15 |
GPT-SW3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. GPT-SW3 has been trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.
|
@@ -18,7 +18,7 @@ GPT-SW3 is a collection of large decoder-only pretrained transformer language mo
|
|
18 |
This version of the 6.7 Billion model is trained with the same tokenizer as the other model sizes, but on a different data distribution (Much more English and Code) and for longer.
|
19 |
|
20 |
# Intended use
|
21 |
-
GPT-SW3 is an autoregressive large language model that is capable of generating coherent text in 5 different languages, and 4 programming languages. GPT-SW3 can also be instructed to perform text tasks that it has not been explicitly trained for, by casting them as text generation tasks.
|
22 |
|
23 |
# Limitations
|
24 |
Like other large language models for which the diversity (or lack thereof) of training data induces downstream impact on the quality of our model, GPT-SW3 has limitations in terms of for example bias and safety. GPT-SW3 can also have quality issues in terms of generation diversity and hallucination. By releasing with the modified RAIL license, we also hope to increase communication, transparency, and the study of large language models. The model may: overrepresent some viewpoints and underrepresent others, contain stereotypes, generate hateful, abusive, violent, discriminatory or prejudicial language. The model may make errors, including producing incorrect information as if it were factual, it may generate irrelevant or repetitive outputs, and content that may not be appropriate for all settings, including sexual content.
|
@@ -78,7 +78,7 @@ Following Mitchell et al. (2018), we provide a model card for GPT-SW3.
|
|
78 |
- Model type: GPT-SW3 is a large decoder-only transformer language model.
|
79 |
- Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: GPT-SW3 was trained with the NeMo Megatron GPT implementation.
|
80 |
- Paper or other resource for more information: N/A.
|
81 |
-
- License: [
|
82 |
- Where to send questions or comments about the model: nlu@ai.se
|
83 |
|
84 |
# Intended Use
|
@@ -107,7 +107,7 @@ Following Mitchell et al. (2018), we provide a model card for GPT-SW3.
|
|
107 |
|
108 |
- Books
|
109 |
- Litteraturbanken (https://litteraturbanken.se/)
|
110 |
-
- The Pile
|
111 |
|
112 |
- Articles
|
113 |
- Diva (https://www.diva-portal.org/)
|
|
|
9 |
---
|
10 |
# Model description
|
11 |
[AI Sweden](https://huggingface.co/AI-Sweden-Models/)
|
12 |
+
[GPT-Sw3 126M](https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m/) | [GPT-Sw3 356M](https://huggingface.co/AI-Sweden-Models/gpt-sw3-356m/) | [GPT-Sw3 1.3B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b/) | [GPT-Sw3 6.7B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b/) | [GPT-Sw3 6.7B v2](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b-v2/) | [GPT-Sw3 20B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-20b/) | [GPT-Sw3 40B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-40b/)
|
13 |
[GPT-Sw3 126M Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m-instruct/) | [GPT-Sw3 356M Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-356m-instruct/) | [GPT-Sw3 1.3B Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b-instruct/) | [GPT-Sw3 6.7B v2 Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct/) | [GPT-Sw3 20B Instruct](https://huggingface.co/AI-Sweden-Models/gpt-sw3-20b-instruct/)
|
14 |
|
15 |
GPT-SW3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. GPT-SW3 has been trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.
|
|
|
18 |
This version of the 6.7 Billion model is trained with the same tokenizer as the other model sizes, but on a different data distribution (Much more English and Code) and for longer.
|
19 |
|
20 |
# Intended use
|
21 |
+
GPT-SW3 is an autoregressive large language model that is capable of generating coherent text in 5 different languages, and 4 programming languages. GPT-SW3 can also be instructed to perform text tasks that it has not been explicitly trained for, by casting them as text generation tasks.
|
22 |
|
23 |
# Limitations
|
24 |
Like other large language models for which the diversity (or lack thereof) of training data induces downstream impact on the quality of our model, GPT-SW3 has limitations in terms of for example bias and safety. GPT-SW3 can also have quality issues in terms of generation diversity and hallucination. By releasing with the modified RAIL license, we also hope to increase communication, transparency, and the study of large language models. The model may: overrepresent some viewpoints and underrepresent others, contain stereotypes, generate hateful, abusive, violent, discriminatory or prejudicial language. The model may make errors, including producing incorrect information as if it were factual, it may generate irrelevant or repetitive outputs, and content that may not be appropriate for all settings, including sexual content.
|
|
|
78 |
- Model type: GPT-SW3 is a large decoder-only transformer language model.
|
79 |
- Information about training algorithms, parameters, fairness constraints or other applied approaches, and features: GPT-SW3 was trained with the NeMo Megatron GPT implementation.
|
80 |
- Paper or other resource for more information: N/A.
|
81 |
+
- License: [LICENSE](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b-v2/blob/main/LICENSE).
|
82 |
- Where to send questions or comments about the model: nlu@ai.se
|
83 |
|
84 |
# Intended Use
|
|
|
107 |
|
108 |
- Books
|
109 |
- Litteraturbanken (https://litteraturbanken.se/)
|
110 |
+
- The Pile
|
111 |
|
112 |
- Articles
|
113 |
- Diva (https://www.diva-portal.org/)
|