New discussion

Finetuning llama2

#47 opened 9 months ago by zuhashaik

Any example of batch inference?

#46 opened 9 months ago by PrintScr

How to set max_split_size_mb?

1
#30 opened 12 months ago by neo-benjamin

max_position_embeddings = 2048?

1
#29 opened 12 months ago by zzzac

Load into 2 GPUs

3
#28 opened 12 months ago by sauravm8

Load model into TGI

#27 opened 12 months ago by schauppi

Perplexity

#22 opened 12 months ago by gsaivinay

70TB with multiple A5000

6
#21 opened 12 months ago by nashid

Inference time with TGI

1
#15 opened about 1 year ago by jacktenyx

Can't launch with TGI

6
#14 opened about 1 year ago by yekta

Bloke - add 70B ggml version please

4
#8 opened about 1 year ago by mirek190

text-generation-inference error

7
#5 opened about 1 year ago by msteele

Output always 0 tokens

11
#4 opened about 1 year ago by sterogn