error deploying model

#38
by jim-bo - opened

Hey, I am trying to deploy this model for text embedding on a dedicated inference endpoint and I get the following error. I was wondering if you had an luck deploying this model using HF TEI.

Thanks

Exit code: 1. Reason: s","log.line":159,"target":"tokenizers::tokenizer::serialization","filename":"/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","line_number":159}
{"timestamp":"2024-12-23T00:29:59.492834Z","level":"WARN","message":"Warning: Token '<|im_end|>' was expected to have ID '151645' but was given ID 'None'","log.target":"tokenizers::tokenizer::serialization","log.module_path":"tokenizers::tokenizer::serialization","log.file":"/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","log.line":159,"target":"tokenizers::tokenizer::serialization","filename":"/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","line_number":159}
{"timestamp":"2024-12-23T00:29:59.494094Z","level":"INFO","message":"Maximum number of tokens per request: 512","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":188}
{"timestamp":"2024-12-23T00:29:59.495050Z","level":"INFO","message":"Starting 8 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":28}
{"timestamp":"2024-12-23T00:29:59.797151Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":230}
{"timestamp":"2024-12-23T00:29:59.799706Z","level":"ERROR","message":"Could not start ORT backend: Could not start backend: File at `/repository/onnx/model.onnx` does not exist","target":"text_embeddings_backend","filename":"backends/src/lib.rs","line_number":225}
{"timestamp":"2024-12-23T00:29:59.802154Z","level":"ERROR","message":"Could not start Candle backend: Could not start backend: Qwen2 is only supported on Cuda devices in fp16 with flash attention enabled","target":"text_embeddings_backend","filename":"backends/src/lib.rs","line_number":255}
Error: Could not create backend

Caused by:
    Could not start backend: Could not start a suitable backend

Sign up or log in to comment