THUDM/glm-4-9b-chat · vllm verison?

Jun 5, 2024

Hi,

I am on vllm v0.4.1, here is a erro message on the tokenizer

arguments

NAME=THUDM/glm-4-9b-chat
TL_PL=2
model_path=$NAME
model_names="gpt-4o" 
CUDA_VISIBLE_DEVICES=0,1 python3 server_vllm.py --model $model_path --host 127.0.0.1 --port 5051 --max-model-len 8192 --served-model-name $model_names --tensor-parallel-size $TL_PL --trust-remote-code

INFO 06-06 00:32:36 server_vllm.py:160] args: Namespace(host='127.0.0.1', port=5051, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], served_model_name='gpt-4o', grammar_sampling=False, model='/media/home/hangyu5/Documents/Hugging-Face/THUDM/glm-4-9b-chat', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=8192, guided_decoding_backend='outlines', worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, speculative_max_model_len=None, model_loader_extra_config=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 06-06 00:32:36 tokenizer.py:123] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
Traceback (most recent call last):
  File "/home/hangyu5/Documents/Gitrepo-My/AIResearchVault/repo/LLMApp/functionary/server_vllm.py", line 173, in <module>
    engine_args.max_logprobs = len(tokenizer.vocab.keys())
                                   ^^^^^^^^^^^^^^^
AttributeError: 'CachedChatGLM4Tokenizer' object has no attribute 'vocab'

Thanks

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Jun 6, 2024

0.4.3 vllm

Yhyu13

Jun 6, 2024

tk!

0.4.3 vllm

zRzRzRzRzRzRzR changed discussion status to closed Jun 6, 2024