Transformers
Safetensors
Inference Endpoints

Can't Use it with VLLM, although gemma-2B from Google is supported

#8
by yaswanth-iitkgp - opened

VLLM supports gemma-2b from google, but when I try to use this version with vllm I get the following error when I try to use it VLLM.

Traceback (most recent call last):
  File "/workspace/offline_inference.py", line 17, in <module>
    llm = LLM(model="mustafaaljadery/gemma-2B-10M", gpu_memory_utilization=0.6) 
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 112, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 196, in from_engine_args
    engine = cls(
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 110, in __init__
    self.model_executor = executor_class(model_config, cache_config,
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 37, in __init__
    self._init_worker()
  File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 66, in _init_worker
    self.driver_worker.load_model()
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 107, in load_model
    self.model_runner.load_model()
  File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 95, in load_model
    self.model = get_model(
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 101, in get_model
    model.load_weights(model_config.model, model_config.download_dir,
  File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/gemma.py", line 390, in load_weights
    param = params_dict[name]
KeyError: 'model.layers.0.self_attn.gate'

I am trying to convert to GGUF using llama.cpp/convert.py, but am stuck with a similar issue with an unknown tensor name.

Traceback (most recent call last):
  File "/Users/thekumar/git/localmodels/llama.cpp/convert.py", line 1714, in <module>
    main()
  File "/Users/thekumar/git/localmodels/llama.cpp/convert.py", line 1700, in main
    model   = convert_model_names(model, params, args.skip_unknown)
  File "/Users/thekumar/git/localmodels/llama.cpp/convert.py", line 1402, in convert_model_names
    raise ValueError(f"Unexpected tensor name: {name}. Use --skip-unknown to ignore it (e.g. LLaVA)")
ValueError: Unexpected tensor name: model.layers.0.self_attn.gate. Use --skip-unknown to ignore it (e.g. LLaVA)

Sign up or log in to comment