MistralLite is not running on Text Generation Inference

#17
by soumodeep-semut - opened

Just a month ago, I used this model and finetuned it for some works but it is not working and giving the below error.

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 83, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 207, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 159, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 252, in get_model
    return FlashMistral(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_mistral.py", line 312, in __init__
    SLIDING_WINDOW_BLOCKS = math.ceil(config.sliding_window / BLOCK_SIZE)
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'

2023-12-27T05:49:01.354948Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 83, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 207, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 159, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 252, in get_model
    return FlashMistral(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_mistral.py", line 312, in __init__
    SLIDING_WINDOW_BLOCKS = math.ceil(config.sliding_window / BLOCK_SIZE)

TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
 rank=0
Error: ShardCannotStart
2023-12-27T05:49:01.453825Z ERROR text_generation_launcher: Shard 0 failed to start
2023-12-27T05:49:01.453846Z  INFO text_generation_launcher: Shutting down shards

The problem seam to be related with the last commit that changed the "sliding_window" to null inside the config. You can try to set it back to 16384 in your cached config file (it should be a path like this ~/models--amazon--MistralLite/snapshots).

I am facing the same issue.
@mihailch Can you please explain how I can set 'sliding_window" if I'm using below method to use tgi?

hub = {
'HF_MODEL_ID':'amazon/MistralLite',
'SM_NUM_GPUS': json.dumps(1)
}

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
)

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)

@yinsong1986
Your last commit in config file for 'sliding_window' is causing the issue with Sagemaker endpoint and with Sagemaker SDK.
Can you please suggest any solution?
Thanks in advance.

Amazon Web Services org

@dorike
Does this temp solution work? basically use the previous revision for the time being.

hub = {
'HF_MODEL_ID':'amazon/MistralLite',
'SM_NUM_GPUS': json.dumps(1),
'HF_MODEL_REVISION':'23486089ab7ba741b34adc69ab7555885f8abe71'
}

@dorike
Does this temp solution work? basically use the previous revision for the time being.

hub = {
'HF_MODEL_ID':'amazon/MistralLite',
'SM_NUM_GPUS': json.dumps(1),
'HF_MODEL_REVISION':'23486089ab7ba741b34adc69ab7555885f8abe71'
}

@chenwuml
Yup this actually worked. Thanks
I learned something new today.

Thanks again.

Sign up or log in to comment