Text Generation
Transformers
Safetensors
falcon
Inference Endpoints
text-generation-inference

No lm_head weight?

#1
by prannaykaul - opened

I might be missing something, but there isn't a lm_head weight in the model_index.json?
Is this intentional?

OpenAssistant org

This is due to the fact that safe_serialization (safetensors) were used which don't allow aliasing and the lm_head weights are tied to transformer.word_embeddings.weight. Had you any problems loading the model? I checked with transformers and TGI and both could load the model properly.

export model=OpenAssistant/falcon-40b-megacode2-oasst
export num_shard=8
export volume=/home/ubuntu/prannay-efs/checkpoints/tgi-FalconOA-megacode2 # share a volume with the Docker container to avoid downloading weights every run

export max_input_length=2048
export max_total_tokens=2049
export docker_image_id=ghcr.io/huggingface/text-generation-inference:0.9.4 # ghcr.io/huggingface/text-generation-inference:0.9.2
export falconlite_tgi_port=448

docker run -d --gpus all \
                        --shm-size 1g -p $falconlite_tgi_port:80 -v /home:/home $docker_image_id \
                        --model-id $model --num-shard ${num_shard} \
                        --trust-remote-code --dtype bfloat16

This is the script I am using to launch the model

2023-08-18T19:54:41.846461Z  INFO text_generation_launcher: Args { model_id: "/home/ubuntu/prannay-efs/checkpoints/tgi-FalconOA-megacode2", revision: None, validation_workers: 2, sharded: None, num_shard: Some(8), quantize: None, dtype: Some(BFloat16), trust_remote_code: true, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "97f4f178d973", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-08-18T19:54:41.846495Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/home/ubuntu/prannay-efs/checkpoints/tgi-FalconOA-megacode2` do not contain malicious code.
2023-08-18T19:54:41.846499Z  INFO text_generation_launcher: Sharding model on 8 processes
2023-08-18T19:54:41.846634Z  INFO download: text_generation_launcher: Starting download process.
2023-08-18T19:54:43.393983Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2023-08-18T19:54:43.748718Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2023-08-18T19:54:43.749128Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2023-08-18T19:54:43.749863Z  INFO shard-manager: text_generation_launcher: Starting shard rank=1
2023-08-18T19:54:43.750865Z  INFO shard-manager: text_generation_launcher: Starting shard rank=2
2023-08-18T19:54:43.750865Z  INFO shard-manager: text_generation_launcher: Starting shard rank=4
2023-08-18T19:54:43.750865Z  INFO shard-manager: text_generation_launcher: Starting shard rank=3
2023-08-18T19:54:43.752334Z  INFO shard-manager: text_generation_launcher: Starting shard rank=5
2023-08-18T19:54:43.753078Z  INFO shard-manager: text_generation_launcher: Starting shard rank=6
2023-08-18T19:54:43.753054Z  INFO shard-manager: text_generation_launcher: Starting shard rank=7
2023-08-18T19:54:53.762085Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=4
2023-08-18T19:54:53.763929Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2023-08-18T19:54:53.765914Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2023-08-18T19:54:53.767026Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=7
2023-08-18T19:54:53.774333Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=5
2023-08-18T19:54:53.780372Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=6
2023-08-18T19:54:53.786063Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2023-08-18T19:54:53.786520Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2023-08-18T19:55:00.297334Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

2023-08-18T19:55:00.399462Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

2023-08-18T19:55:00.505569Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

2023-08-18T19:55:00.508876Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

2023-08-18T19:55:00.510836Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

2023-08-18T19:55:00.518517Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

2023-08-18T19:55:00.560307Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

2023-08-18T19:55:00.685230Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight lm_head.weight does not exist

2023-08-18T19:55:03.790575Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2023-08-18T19:55:03.790920Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=5
2023-08-18T19:55:03.796426Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=4
2023-08-18T19:55:03.799873Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=1
2023-08-18T19:55:03.802729Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=7
2023-08-18T19:55:03.804963Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=6
2023-08-18T19:55:03.807944Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=3
2023-08-18T19:55:03.834266Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=2
2023-08-18T19:55:13.209919Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
[W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address).
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type falcon to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight lm_head.weight does not exist
 rank=7
2023-08-18T19:55:13.291609Z ERROR text_generation_launcher: Shard 7 failed to start
2023-08-18T19:55:13.291631Z  INFO text_generation_launcher: Shutting down shards
2023-08-18T19:55:13.342262Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type falcon to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 78, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 184, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 136, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 208, in get_model
    return FlashRWSharded(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_rw.py", line 64, in __init__
    model = FlashRWForCausalLM(config, weights)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_rw_modeling.py", line 624, in __init__
    self.lm_head = TensorParallelHead.load(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py", line 199, in load
    weight = weights.get_sharded(f"{prefix}.weight", dim=0)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 101, in get_sharded
    filename, tensor_name = self.get_filename(tensor_name)

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py", line 52, in get_filename
    raise RuntimeError(f"weight {tensor_name} does not exist")

RuntimeError: weight lm_head.weight does not exist
 rank=2
2023-08-18T19:55:13.381545Z  INFO shard-manager: text_generation_launcher: Shard terminated rank=1
2023-08-18T19:55:13.498104Z  INFO shard-manager: text_generation_launcher: Shard terminated rank=0
2023-08-18T19:55:13.615350Z  INFO shard-manager: text_generation_launcher: Shard terminated rank=4
2023-08-18T19:55:13.714285Z  INFO shard-manager: text_generation_launcher: Shard terminated rank=3
2023-08-18T19:55:13.783450Z  INFO shard-manager: text_generation_launcher: Shard terminated rank=5
2023-08-18T19:55:13.852918Z  INFO shard-manager: text_generation_launcher: Shard terminated rank=6

here we can see a runtime error where the launcher cannot find the "lm_head" weight.

OpenAssistant org

There should be an alias defined for lm_head.weight in flash_rw.py, see https://github.com/huggingface/text-generation-inference/blob/c4422e567811051e40469551c58a8078158a6656/server/text_generation_server/models/flash_rw.py#L57 .. it was added two weeks ago in TGI, probably not part of the officially released docker container...

OpenAssistant org

I saved some notes how to build TGI manually in a gist.

Thanks for your help, I found the edit and made a new docker image. Thank you for the TGI build gist, will take a look.

Sign up or log in to comment