File size: 2,205 Bytes
20c7bef
 
559b5ee
 
 
 
 
20c7bef
 
 
 
 
 
 
 
 
 
 
 
559b5ee
 
20c7bef
559b5ee
 
20c7bef
 
 
 
 
 
 
559b5ee
20c7bef
 
 
559b5ee
20c7bef
 
 
559b5ee
20c7bef
 
 
 
 
559b5ee
 
20c7bef
559b5ee
 
 
20c7bef
559b5ee
 
20c7bef
559b5ee
 
20c7bef
 
559b5ee
20c7bef
559b5ee
20c7bef
559b5ee
20c7bef
 
 
559b5ee
 
20c7bef
 
 
 
 
559b5ee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
library_name: transformers
tags:
- text-generation-inference
license: mit
language:
- en
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

Model Description:
This model card presents details for the gpt2-xl model, a large autoregressive language model optimized for text generation tasks. The model uses the GPT-2 architecture developed by OpenAI.

- **Model type:** Autoregressive Language Model
- **Language(s) (NLP):** English]

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

The model can be used for text generation tasks, such as completing sentences or generating coherent paragraphs.

## Bias, Risks, and Limitations

The model may exhibit biases present in the training data and could generate inappropriate or sensitive content. Users should exercise caution when deploying the model in production.

### Recommendations

Users should be aware of potential biases and limitations of the model, particularly when used in applications that involve sensitive or high-stakes content.

## How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "gpt2-xl"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_txt = "Bananas are a great"
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"]

output = model.generate(input_ids, max_length=200, do_sample=False)
print(tokenizer.decode(output[0]))


## Training Details

### Training Data

The model was trained on a diverse range of internet text, including news articles, books, and websites.

#### Training Hyperparameters

Training regime: Autoregressive training with large-scale language modeling objectives
Compute infrastructure: GPUs (specific details not disclosed)

## Evaluation

### Testing Data, Factors & Metrics

The model was evaluated on standard language modeling benchmarks, including perplexity scores on held-out data.