FreedomIntelligence/phoenix-inst-chat-7b-int4 · Should there be tokenizer files in the repo?

May 4, 2023
I pulled this repo the local directory and try to load with --model-path by setting a local path. But Hugging face transformer still want to download the tokenizer confi from online which causes some error
│ /home/hangyu5/Documents/Git-repoMy/AIResearchVault/repo/LLM/BLOOM/LLMZoo/llmzoo/deploy/webapp/in │
│ ference.py:235 in chat_loop                                                                      │
│                                                                                                  │
│   232 │   │   debug: bool,                                                                       │
│   233 ):                                                                                         │
│   234 │   # Model                                                                                │
│ ❱ 235 │   model, tokenizer = load_model(                                                         │
│   236 │   │   model_path, device, num_gpus, max_gpu_memory, load_8bit, load_4bit, debug          │
│   237 │   )                                                                                      │
│   238                                                                                            │
│                                                                                                  │
│ /home/hangyu5/Documents/Git-repoMy/AIResearchVault/repo/LLM/BLOOM/LLMZoo/llmzoo/deploy/webapp/in │
│ ference.py:94 in load_model                                                                      │
│                                                                                                  │
│    91 │   │   tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)               │
│    92 │   │   model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True,   │
│    93 │   else:                                                                                  │
│ ❱  94 │   │   tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)               │
│    95 │   │   model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True,   │
│    96 │                                                                                          │
│    97 │   if load_8bit:                                                                          │
│                                                                                                  │
│ /home/hangyu5/anaconda3/envs/pheonix/lib/python3.10/site-packages/transformers/models/auto/token │
│ ization_auto.py:642 in from_pretrained                                                           │
│                                                                                                  │
│   639 │   │   │   return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input   │
│   640 │   │                                                                                      │
│   641 │   │   # Next, let's try to use the tokenizer_config file to get the tokenizer class.     │
│ ❱ 642 │   │   tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)   │
│   643 │   │   if "_commit_hash" in tokenizer_config:                                             │
│   644 │   │   │   kwargs["_commit_hash"] = tokenizer_config["_commit_hash"]                      │
│   645 │   │   config_tokenizer_class = tokenizer_config.get("tokenizer_class")                   │
│                                                                                                  │
│ /home/hangyu5/anaconda3/envs/pheonix/lib/python3.10/site-packages/transformers/models/auto/token │
│ ization_auto.py:486 in get_tokenizer_config                                                      │
│                                                                                                  │
│   483 │   tokenizer_config = get_tokenizer_config("tokenizer-test")                              │
│   484 │   ```"""                                                                                 │
│   485 │   commit_hash = kwargs.get("_commit_hash", None)                                         │
│ ❱ 486 │   resolved_config_file = cached_file(                                                    │
│   487 │   │   pretrained_model_name_or_path,                                                     │
│   488 │   │   TOKENIZER_CONFIG_FILE,                                                             │
│   489 │   │   cache_dir=cache_dir,                                                               │
│                                                                                                  │
│ /home/hangyu5/anaconda3/envs/pheonix/lib/python3.10/site-packages/transformers/utils/hub.py:409  │
│ in cached_file                                                                                   │
│                                                                                                  │
│    406 │   user_agent = http_user_agent(user_agent)                                              │
│    407 │   try:                                                                                  │
│    408 │   │   # Load from URL or cache if already cached                                        │
│ ❱  409 │   │   resolved_file = hf_hub_download(                                                  │
│    410 │   │   │   path_or_repo_id,                                                              │
│    411 │   │   │   filename,                                                                     │
│    412 │   │   │   subfolder=None if len(subfolder) == 0 else subfolder,                         │
│                                                                                                  │
│ /home/hangyu5/anaconda3/envs/pheonix/lib/python3.10/site-packages/huggingface_hub/utils/_validat │
│ ors.py:112 in _inner_fn                                                                          │
│                                                                                                  │
│   109 │   │   │   kwargs.items(),  # Kwargs values                                               │
│   110 │   │   ):                                                                                 │
│   111 │   │   │   if arg_name in ["repo_id", "from_id", "to_id"]:                                │
│ ❱ 112 │   │   │   │   validate_repo_id(arg_value)                                                │
│   113 │   │   │                                                                                  │
│   114 │   │   │   elif arg_name == "token" and arg_value is not None:                            │
│   115 │   │   │   │   has_token = True                                                           │
│                                                                                                  │
│ /home/hangyu5/anaconda3/envs/pheonix/lib/python3.10/site-packages/huggingface_hub/utils/_validat │
│ ors.py:160 in validate_repo_id                                                                   │
│                                                                                                  │
│   157 │   │   raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_   │
│   158 │                                                                                          │
│   159 │   if repo_id.count("/") > 1:                                                             │
│ ❱ 160 │   │   raise HFValidationError(                                                           │
│   161 │   │   │   "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"            │
│   162 │   │   │   f" '{repo_id}'. Use `repo_type` argument if needed."                           │
│   163 │   │   )                                                                                  │
╰─────────────────────────────────────────────────────────────────
GeneZC
FreedomAI org May 5, 2023
Yes, we should include tokenizer files. And you could reuse the tokenizer files from FreedomIntelligence/phoenix-inst-chat-7b at the moment.
Thanks for pointing that out.
GeneZC changed discussion status to closed May 5, 2023
GeneZC changed discussion status to open May 5, 2023
GeneZC
FreedomAI org May 5, 2023
And we have found a bug in our code, please use the updated version of our repo.