Spaces:
Running
on
CPU Upgrade
Python backend
Setup
pip install -r requirements.txt
chmod +x launch.sh
Execution
./launch.sh
Usage
The API listens to the port 6006
and the route autocomplete
. It listens to POST
requests.
Query it like this: {POST}http://<url>:6006/autocomplete
The necessary argument is context
which is a string of characters (ideally a sentence) which will be converted in tokens and fed to GPT-2.
The optional arguments are detailed below:
length
is an unsigned int which sets the maximum length (in tokens) of the generated sentence default: 100
n_samples
is an int 0 < n_samples <= 3
which sets the maximum amount of samples generated. default: 3
max_time
is an unsigned float which sets an heuristic for the maximum time spent generating sentences. It is a heuristic because it is not exact, it can slightly overflow. default: infinite
model_size
takes "small"
or "medium"
as input and corresponds to the GPT model size default: small
temperature
float - temperature of the model default: 1
max_tokens
int - maximum amount of tokens that will be fed into the model. default: 256
top_p
float - 0 < top_p < 1, nucleus sampling; only tokens with a cumulative probability of top_p will be selected for multinomial sampling default: 0.9
top_k
int - Only top k tokens will be selected for multinomial sampling. default: 256
Return format
The server returns a set of sentences according to the context. Their format is:
{sentences: {value: string, time: number}[], time: number}
Example:
With POST parameters as:
{
"context": "That man is just another",
"samples": 3
}
The response is as follows:
{
"sentences": [
{"value": " handicapped working man.", "time": 0.15415167808532715},
{"value": " guy, doing everything his manly nature requires.", "time": 0.2581148147583008},
{"value": " guy, Mohr said.", "time": 0.17547011375427246}
],
"time": 0.264873743057251
}