Model not working

#3
by Imad-el-achiri - opened

I used llama_cpp to load the 4-bit gguf version of the model, I used the prompt mentioned on the paper, but I get an empty string as a response, but it seems to work on reranking when I only retrieve less than 5 passages, I guess there is a context window size problem, how to implement the sliding window approach?
I also noticed that the inference time is so long (>1s)

Sign up or log in to comment