--- license: bsd-3-clause tags: - endpoints-template pipeline_tag: text-generation --- # Sharded fork of [Salesforce/codegen-6B-mono](https://huggingface.co/Salesforce/codegen-6B-mono) with a custom pipeline.py This repository implements a custom `pipeline` task for `text-generation` for 🤗 Inference Endpoints for LLM inference using bitsandbytes quantization. The code for the customized pipeline is in the [pipeline.py](https://huggingface.co/philschmid/codegen-6B-mono-sharded-bnb/blob/main/pipeline.py). There is also a [notebook](https://huggingface.co/philschmid/codegen-6B-mono-sharded-bnb/blob/main/create_handler.ipynb) included. ### expected Request payload ```json { "inputs": "# load distilbert model and initialize text-classification pipeline\nmodel_id = 'distil", "parameters": { "top_k": 100, "max_length": 64, "early_stopping": true, "do_sample": true, "eos_token_id": 50256, } } ``` below is an example on how to run a request using Python and `requests`. ## Run Request ```python import json from typing import List import requests as r import base64 ENDPOINT_URL = "" HF_TOKEN = "" parameters={ "top_k": 100, "max_length": 64, "early_stopping": True, "do_sample": True, "eos_token_id": 50256, } def predict(code_snippet:str=None): payload = {"inputs": code_snippet,"parameters": parameters} response = r.post( ENDPOINT_URL, headers={"Authorization": f"Bearer {HF_TOKEN}"}, json=payload ) return response.json() prediction = predict( code_snippet="# load distilbert model and initialize text-classification pipeline\nmodel_id = 'distil" ) ``` expected output ```python {'generated_text': "# load distilbert model and initialize text-classification pipeline\nmodel_id = 'distilbert-base-uncased'\nmodel_url = 'https://tfhub.dev/tensorflow/small_bert/1'\n\nmodel_dir = './distilBERT'"} ```