Text Generation
Transformers
PyTorch
mistral
openchat
C-RLFT
conversational
Inference Endpoints
text-generation-inference

Why does it report an error like this when running?

#12
by Simkinhu - opened

Hello openchat team,

root@autodl-container-a44e4284cd-59f17d3a:/autodl-tmp# python -m ochat.serving.openai_api_server --model openchat/openchat_3.5
FlashAttention not found. Install it if you need to train models.
FlashAttention not found. Install it if you need to train models.
2023-11-10 01:15:51,222 WARNING utils.py:581 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set RAY_USE_MULTIPROCESSING_CPU_COUNT=1 as an env var before starting Ray. Set the env var: RAY_DISABLE_DOCKER_CPU_WARNING=1 to mute this warning.
2023-11-10 01:15:52,296 INFO worker.py:1673 -- Started a local Ray instance.
(pid=3924) FlashAttention not found. Install it if you need to train models.
(pid=3924) FlashAttention not found. Install it if you need to train models.
(AsyncTokenizer pid=3924) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 11-10 01:15:56 llm_engine.py:72] Initializing an LLM engine with config: model='openchat/openchat_3.5', tokenizer='openchat/openchat_3.5', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 11-10 01:16:10 llm_engine.py:207] # GPU blocks: 3490, # CPU blocks: 2048
INFO: Started server process [3216]
INFO: Waiting for application startup.
INFO: Application startup complete.
ERROR: [Errno 99] error while attempting to bind on address ('::1', 18888, 0, 0): cannot assign requested address
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
root@autodl-container-a44e4284cd-59f17d3a:
/autodl-tmp#

The environment has been installed and nothing is wrong. The graphics card is a 4090 single card. The command to run is

python -m ochat.serving.openai_api_server --model openchat/openchat_3.5

OpenChat org

This means that your computer does not have an IPv6 address. Try adding --host 127.0.0.1 as a command line argument.

Sign up or log in to comment