databricks/dbrx-instruct · remove "run on CPU" from documentation

remove "run on CPU" from documentationed6cd8dc

eitanturok

Databricks org Apr 1, 2024

No description provided.

srowen

Databricks org Apr 1, 2024

(I agree with this FWIW)

Smilesz

Apr 1, 2024

The documentation should state the hardware requirements to run the model: https://huggingface.co/databricks/dbrx-instruct/discussions/28

Currently the example provides the following text:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 61/61 [00:04<00:00, 12.35it/s]
Setting `pad_token_id` to `eos_token_id`:100257 for open-end generation.

and then hangs. There does not seem to be any process or mem caching activities.

If you set
outputs = model.generate(**input_ids, max_new_tokens=200)
to
outputs = model.generate(**input_ids, max_new_tokens=200, verbose=True)
Then:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 61/61 [00:06<00:00,  9.90it/s]
Traceback (most recent call last):
  File "/Volumes/python/llm_dbrx-instruct.py", line 113, in <module>
    main(isInstruct=True)
  File "/Volumes/python/llm_dbrx-instruct.py", line 88, in main
    outputs = model.generate(**input_ids, max_new_tokens=200, verbose=True)
  File "/Users/user/.pyenv/versions/3.9.6/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/.pyenv/versions/3.9.6/lib/python3.9/site-packages/transformers/generation/utils.py", line 1325, in generate
    self._validate_model_kwargs(model_kwargs.copy())
  File "/Users/user/.pyenv/versions/3.9.6/lib/python3.9/site-packages/transformers/generation/utils.py", line 1121, in _validate_model_kwargs
    raise ValueError(
ValueError: The following `model_kwargs` are not used by the model: ['verbose'] (note: typos in the generate arguments will also show up in this list)

I am learning, and it was not immediately apparent to me that 264GB RAM was required to run a model.

hanlintang changed pull request status to merged Apr 2, 2024