Ekgren commited on
Commit
8216996
1 Parent(s): 2cf5a44

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail
3
+ ---
4
+ # Model description
5
+ [AI Sweden](https://huggingface.co/AI-Sweden/) | [GPT-Sw3 126M](https://huggingface.co/AI-Sweden-Models/gpt-sw3-126m/) | [GPT-Sw3 356M](https://huggingface.co/AI-Sweden-Models/gpt-sw3-356m/) | [GPT-Sw3 1.3B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b/) | [GPT-Sw3 6.7B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b/) | [GPT-Sw3 20B](https://huggingface.co/AI-Sweden-Models/gpt-sw3-20b/)
6
+
7
+ GPT-SW3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. GPT-SW3 has been trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.
8
+
9
+ # Intended use
10
+ GPT-SW3 is an autoregressive large language model that is capable of generating coherent text in 5 different languages, and 4 programming languages. GPT-SW3 can also be instructed to perform text tasks that it has not been explicitly trained for, by casting them as text generation tasks. AI Sweden shares GPT-SW3 in a controlled pre-release with organizations and individuals in the Nordic NLP ecosystem who can contribute to the validation and testing of the models and provide feedback to the community. This is an important step in the process of validating the model and collecting feedback on both what works well and what does not.
11
+
12
+ # Limitations
13
+ Like other large language models for which the diversity (or lack thereof) of training data induces downstream impact on the quality of our model, GPT-SW3 has limitations in terms of for example bias and safety. GPT-SW3 can also have quality issues in terms of generation diversity and hallucination. By releasing with the RAIL license, we also hope to increase communication, transparency, and the study of large language models. The model may: overrepresent some viewpoints and underrepresent others, contain stereotypes, generate hateful, abusive, violent, discriminatory or prejudicial language. The model may make errors, including producing incorrect information as if it were factual, it may generate irrelevant or repetitive outputs, and content that may not be appropriate for all settings, including sexual content.
14
+
15
+ # How to use
16
+ The following code snippet loads our tokenizer & model, and uses the GPU if available.
17
+
18
+ ```python
19
+ import torch
20
+ from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
21
+
22
+ # Initialize Variables
23
+ model_name = "AI-Sweden/gpt-sw3-356m"
24
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
25
+ prompt = "Träd är fina för att"
26
+
27
+ # Initialize Tokenizer & Model
28
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
29
+ model = AutoModelForCausalLM.from_pretrained(model_name)
30
+ model.eval()
31
+ model.to(device)
32
+ ```
33
+
34
+ Generating text using the `generate` method is done as follows:
35
+ ```python
36
+ input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)
37
+
38
+ generated_token_ids = model.generate(
39
+ inputs=input_ids,
40
+ max_new_tokens=100,
41
+ do_sample=True,
42
+ temperature=0.6,
43
+ top_p=1,
44
+ )[0]
45
+
46
+ generated_text = tokenizer.decode(generated_token_ids)
47
+ ```
48
+
49
+ A convenient alternative to the `generate` method is the HuggingFace pipeline, which handles most of the work for you:
50
+ ```python
51
+ generator = pipeline('text-generation', tokenizer=tokenizer, model=model, device=device)
52
+ generated = generator(prompt, max_new_tokens=100, do_sample=True, temperature=0.6, top_p=1)[0]["generated_text"]
53
+ ```
54
+
55
+ # Compliance
56
+ The release of GPT-SW3 consists of model weights, a configuration file, a tokenizer file and a vocabulary file. None of these files contain any personally identifiable information (PII) or any copyrighted material.