Supporting the model on text-generation-inference server

#8
by rsalshalan - opened

Hi all,
Thanks for the great efforts!

I would like to ask why cant i use the model with text-generation-inference

I tried to launch the server as follows text-generation-launcher --model-id data/jais-13b-chat (I downloaded the repo locally).

Here are the results:

023-09-11T11:02:01.949055Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 81, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 298, in get_model
    raise ValueError(f"Unsupported model type {model_type}")

ValueError: Unsupported model type jais

Am I missing something?

Would appreciate your support and if you need any more details about this please let me know

It works with me but you need to use transformer loader. I do not know how to do this using command line

I'm getting the same error. How do I run it on a server for inference ? Using TGI or anthing else ? Do help us with the necessary parameters.

I dont know what you mean but if you wants to load it with 4bit to work with low VRAM here what I am using

my windows,
https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64
python 3.11

then python environment (i.e python -m vevn venv)

pip install transformers, accelerate
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl
pip install torch==2.0.1+cu117 --index-url https://download.pytorch.org/whl/cu117

a working python, I have two GPUs that is why I am specifying cuda:0 because it has 24G vram

# -*- coding: utf-8 -*-

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "C:\AI\ML Models\inception-mbzuai_jais-13b"

device = "cuda:0" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device,load_in_4bit=True, trust_remote_code=True,
                                             bnb_4bit_compute_dtype=torch.float16)

def get_response(text,tokenizer=tokenizer,model=model):
    input_ids = tokenizer(text, return_tensors="pt").input_ids
    inputs = input_ids.to(device)
    input_len = inputs.shape[-1]
    generate_ids = model.generate(
        inputs,
        top_p=0.9,
        temperature=0.3,
        max_length=200-input_len,
        min_length=input_len + 4,
        repetition_penalty=1.2,
        do_sample=True,
    )
    response = tokenizer.batch_decode(
        generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
    )[0]
    return response


text= "عاصمة دولة الإمارات العربية المتحدة ه"
print(get_response(text))

text = "The capital of UAE is"
print(get_response(text))

Also you may need to use peft, I am not sure what it does but it solves some an error I was getting

pip install peft

# quantization_config
nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.float16
)

from peft import prepare_model_for_kbit_training
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda:0", quantization_config=nf4_config,trust_remote_code=True)
model = prepare_model_for_kbit_training(model)

Sign up or log in to comment