NumbersStation/nsql-llama-2-7B · Encounter an exception while trying to run it using manifest

Aug 2, 2023

Hi, I cloned this model into my local machine and tried to use it using the following manifest command:
python3 -m manifest.api.app
--model_type huggingface
--model_generation_type llama-text-generation
--model_name_or_path nsql-llama-2-7B
--device 0

but I'm getting this exception:

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/lib/python3.11/site-packages/manifest/api/app.py", line 301, in
main()
File "/lib/python3.11/site-packages/manifest/api/app.py", line 148, in main
model = MODEL_CONSTRUCTORS[model_type](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/manifest/api/models/huggingface.py", line 474, in __init__
tokenizer = LlamaTokenizer.from_pretrained(self.model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1825, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1988, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: /Users/runner/work/sentencepiece/sentencepiece/src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Would appreciate your help here

senwu

NumbersStation org Aug 2, 2023

Hi @dudub ,

Can you try this cmd?

python3 -m manifest.api.app \
    --model_type huggingface \
    --model_generation_type text-generation \
    --model_name_or_path NumbersStation/nsql-llama-2-7B \
    --device 0

The current version manifest uses an old class to load llama based model and we will update it in the future manifest release. Thanks!

dudub

Aug 2, 2023

@senwu
Thanks for the quick answer!
Yes I ran it and looks like a server is up and running but now I'm getting this error while I'm trying to communicate with the LLM using LangChain:

Running on all addresses (0.0.0.0)
Running on http://127.0.0.1:5002
Running on http://192.168.1.108:5002
Press CTRL+C to quit
127.0.0.1 - - [02/Aug/2023 23:37:28] "POST /params HTTP/1.1" 200 -
127.0.0.1 - - [02/Aug/2023 23:37:28] "POST /params HTTP/1.1" 200 -
The following model_kwargs are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)
127.0.0.1 - - [02/Aug/2023 23:37:28] "POST /completions HTTP/1.1" 400 -

That's how I configured it in the LangChain side:
manifest = Manifest(
client_name="huggingface",
client_connection="http://127.0.0.1:5002"
)

local_llm = ManifestWrapper(
    client=manifest, 
    llm_kwargs={"temperature": 0.0, "max_tokens": 1024}, 
    verbose=True
)

and used the llm in a SQL agent.

senwu

NumbersStation org Aug 2, 2023

Hi @dudub ,

I am not very familiar with LangCahin but from the error message above it seems like LangChain sends the wrong arguments to manifest. Can you double-check the argument?

dudub

Aug 3, 2023

•

edited Aug 3, 2023

@senwu
I think it's not an issue with LangChain but Manifest cause I'm getting the same error we a simple POST request using Postman:

curl --location 'http://127.0.0.1:5002/completions'
--header 'Content-Type: application/json'
--data '{
"prompt": "Hello World",
"max_tokens": 1024,
"temperature": 0.0,
"repetition_penalty": 1,
"top_k": 50,
"top_p": 10,
"do_sample": "True",
"n": 1,
"max_new_tokens": 1024
}'

I think it's something related to the Manifest server/transformers.
https://huggingface.co/OpenAssistant/falcon-40b-sft-mix-1226/discussions/2

are you familiar with another way to run this model locally besides Manifest?

senwu

NumbersStation org Aug 3, 2023

Hi @dudub ,

Which transformers version are you using? We are using transformer 4.31.0 and it works.

FLASK_PORT=7000 python3 -m manifest.api.app --model_type huggingface --model_generation_type text-generation --model_name_or_path NumbersStation/nsql-llama-2-7B --device 0
Model Name: NumbersStation/nsql-llama-2-7B Model Path: NumbersStation/nsql-llama-2-7B
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:20<00:00,  6.87s/it]
Loaded Model DType torch.float32
Usings max_length: 4096
 * Serving Flask app 'app'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:7000
 * Running on http://38.99.106.21:7000
Press CTRL+C to quit
100.110.106.20 - - [03/Aug/2023 10:10:29] "POST /completions HTTP/1.1" 200 -

curl --location 'http://127.0.0.1:7000/completions' --header 'Content-Type: application/json' --data '{
"prompt": "Hello World",
"max_tokens": 1024,
"temperature": 0.0,
"repetition_penalty": 1,
"top_k": 50,
"top_p": 10,
"do_sample": "True",
"n": 1,
"max_new_tokens": 1024
}'
{"id": "0b13e512-4d03-4d97-8f90-e97d5479ead2", "object": "text_completion", "created": 1691082629, "model": "flask_model", "choices": [{"text": "\n", "logprob": -2.203536033630371, "tokens": [13, 2], "token_logprobs": [-1.6638275384902954, -0.5397084951400757]}]}

dudub

Aug 3, 2023

@senwu
Thanks again for your help, It indeed was that issue and now the error is gone but now Im facing a new one...
can you tell me on what machine are you running it?
I'm trying to run it on my local Macbook Pro M1 Pro but getting the following error:
"addmm_impl_cpu_" not implemented for 'Half'

maybe I can even do it and need to deploy it on a remote machine? (for now, it's just for testing and playground of course)

That's my log when the server starts up:

[2023-08-03 23:19:50,322] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Model Name: nsql-llama-2-7B Model Path: nsql-llama-2-7B
Loading checkpoint shards: 100%|██████████| 3/3 [00:29<00:00, 9.86s/it]
Loaded Model DType torch.float16
Usings max_length: 4096

Serving Flask app 'app'
Debug mode: off

senwu

NumbersStation org Aug 3, 2023

We've tested the model on Ubuntu 20.04.

ditchtech

Aug 10, 2023

Any luck dudub on using a local database query language? I get the best response with OpenAI models, but trying to replicate with private LLM setup.

senwu

NumbersStation org Aug 16, 2023

We've provided some tutorials about the local database query generation here and you can also SQLGlot to convert the generated query into the dialect you want.

senwu changed discussion status to closed Aug 25, 2023