Text Generation
Transformers
PyTorch
English
German
llama
conversational
Inference Endpoints
text-generation-inference

Not compatible with text-generation-inference

#3
by rredl - opened

I'm trying to use this model with TGI (Version 1.2, docker image), but I get the following error:

2023-12-11T10:08:09.585078Z ERROR text_generation_launcher: Method Decode encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 83, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 207, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 121, in Decode
    generations, next_batch = self.model.generate_token(batch)
  File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 893, in generate_token
    next_token_text, prefix_offset, read_offset = self.decode_token(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/model.py", line 83, in decode_token
    new_text = self.tokenizer.decode(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3550, in decode
    return self._decode(
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 938, in _decode
    filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
  File "/opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 919, in convert_ids_to_tokens
    tokens.append(self._convert_id_to_token(index))
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 243, in _convert_id_to_token
    token = self.sp_model.IdToPiece(index)
  File "/opt/conda/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1045, in _batched_func
    return _func(self, arg)
  File "/opt/conda/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
    raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.

Sign up or log in to comment