XciD's picture
XciD HF staff
initial commit
8969f81
|
raw
history blame
2.04 kB

Python backend

Setup

pip install -r requirements.txt
chmod +x launch.sh

Execution

./launch.sh

Usage

The API listens to the port 6006 and the route autocomplete. It listens to POST requests. Query it like this: {POST}http://<url>:6006/autocomplete

The necessary argument is context which is a string of characters (ideally a sentence) which will be converted in tokens and fed to GPT-2.

The optional arguments are detailed below:

length is an unsigned int which sets the maximum length (in tokens) of the generated sentence default: 100

n_samples is an int 0 < n_samples <= 3 which sets the maximum amount of samples generated. default: 3

max_time is an unsigned float which sets an heuristic for the maximum time spent generating sentences. It is a heuristic because it is not exact, it can slightly overflow. default: infinite

model_size takes "small" or "medium" as input and corresponds to the GPT model size default: small

temperature float - temperature of the model default: 1

max_tokens int - maximum amount of tokens that will be fed into the model. default: 256

top_p float - 0 < top_p < 1, nucleus sampling; only tokens with a cumulative probability of top_p will be selected for multinomial sampling default: 0.9

top_k int - Only top k tokens will be selected for multinomial sampling. default: 256

Return format

The server returns a set of sentences according to the context. Their format is:

{sentences: {value: string, time: number}[], time: number}

Example:

With POST parameters as:

{
    "context": "That man is just another",
    "samples": 3
}

The response is as follows:

{
    "sentences": [
        {"value": " handicapped working man.", "time": 0.15415167808532715}, 
        {"value": " guy, doing everything his manly nature requires.", "time": 0.2581148147583008},
        {"value": " guy, Mohr said.", "time": 0.17547011375427246}
    ],  
    "time": 0.264873743057251
}