runtime error
Exit code: 1. Reason: se, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_tokens: None, max_input_length: None, max_total_tokens: None, waiting_served_ratio: 0.3, max_batch_prefill_tokens: Some( 4096, ), max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, cuda_graphs: None, hostname: "r-sidreds06-mhcbv1-scmsmucm-72cb9-lndyu", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-generation-inference.router", cors_allow_origin: [], api_key: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false, max_client_batch_size: 4, lora_adapters: None, usage_stats: On, payload_limit: 2000000, enable_prefill_logprobs: false, } [2m2025-03-19T00:42:36.170778Z[0m [33m WARN[0m [2mtext_generation_launcher::gpu[0m[2m:[0m Cannot determine GPU compute capability: ModuleNotFoundError: No module named 'torch' [2m2025-03-19T00:42:36.170806Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Using attention flashinfer - Prefix caching true [2m2025-03-19T00:42:36.170895Z[0m [32m INFO[0m [2mtext_generation_launcher[0m[2m:[0m Using default cuda graphs [1, 2, 4, 8, 16, 32] [2m2025-03-19T00:42:36.171020Z[0m [32m INFO[0m [1mdownload[0m: [2mtext_generation_launcher[0m[2m:[0m Starting check and download process for Sidreds06/MHCV1 [2m2025-03-19T00:42:36.179566Z[0m [31mERROR[0m [1mdownload[0m: [2mtext_generation_launcher[0m[2m:[0m Permission denied (os error 13) Error: DownloadError
Container logs:
Fetching error logs...