Text Generation
Transformers
PyTorch
English
llama
Inference Endpoints
text-generation-inference

How to use on GPU

#33
by p2991459 - opened

How can I use the model on cpu when I fun it through float32 it says that
in rotary_kernel(OUT, X, COS, SIN, CU_SEQLENS, SEQLEN_OFFSETS, seqlen, nheads, rotary_dim, seqlen_ro, CACHE_KEY_SEQLEN, stride_out_batch, stride_out_seqlen, stride_out_nheads, stride_out_headdim, stride_x_batch, stride_x_seqlen, stride_x_nheads, stride_x_headdim, BLOCK_K, IS_SEQLEN_OFFSETS_TENSOR, IS_VARLEN, INTERLEAVED, CONJUGATE, BLOCK_M, grid, num_warps, num_stages, extern_libs, stream, warmup, device, device_type)

ValueError: Cannot find backend for cpu

Sign up or log in to comment