Ready to use Mistral-7B-Instruct-v0.1-GGUF model as OpenAI API compatible endpoint

#29
by limcheekin - opened

Hi there,

I deployed the model as OpenAI API compatible endpoint at https://huggingface.co/spaces/limcheekin/Mistral-7B-Instruct-v0.1-GGUF.

Also, I created a jupyter notebook to get you started to use the API endpoint in no time.

Lastly, if you find this resource valuable, your support in the form of starring the space would be greatly appreciated.

Thank you.

First of all thanks for this ! I But we need to purchase OpenAI credit points for this right? I am a beginner.

deleted

Its compatible with, not running at. Unsure if HF API is free or not, but it woudl be their charges, not OpenAI

It is free of charge.
But I think HF definitely have a cap on number of requests can be made to the free tier HF spaces per hour or per day. Anyone here know the cap?

That's the reason your support is important.

Alternatively, you can duplicate the space and run your own instance for free.

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. What is embeddings are fine tuned of mistralai/Mistral-7B-Instruct-v0.1

It is free of charge.
But I think HF definitely have a cap on number of requests can be made to the free tier HF spaces per hour or per day. Anyone here know the cap?

That's the reason your support is important.

Alternatively, you can duplicate the space and run your own instance for free.

Thank you so so much for this man ! Appreciated. I used this to make an app for question answering a pdf. Just wanted to ask that when i use this model through this space; on which api does this bounce back from exatcly? Is it Openai? or the huggingface's api for this model? Coz HF's api wasn't giving the full ouput (only around 10 tokens) when used through langchain. And if its openai then how is it free ? Please help. I am a beginner.

Thanks a lot, much appreciated, it works like a charm! I do have one little issue however... I get this weird output where every new sentence starts with a number. Do you know why that might be the case?
Screenshot 2023-10-02 at 5.14.07 PM.png

It is free of charge.
But I think HF definitely have a cap on number of requests can be made to the free tier HF spaces per hour or per day. Anyone here know the cap?

That's the reason your support is important.

Alternatively, you can duplicate the space and run your own instance for free.

Thank you so so much for this man ! Appreciated. I used this to make an app for question answering a pdf. Just wanted to ask that when i use this model through this space; on which api does this bounce back from exatcly? Is it Openai? or the huggingface's api for this model? Coz HF's api wasn't giving the full ouput (only around 10 tokens) when used through langchain. And if its openai then how is it free ? Please help. I am a beginner.

It is neither from OpenAI nor HuggingFace API. Thanks for the following generous offer from the HF:
| Hardware | GPU Memory | CPU | Memory | Disk | Hourly Price |
| CPU Basic | - | 2 vCPU | 16 GB | 50 GB | Free! |
You can find out more information at https://huggingface.co/docs/hub/spaces-overview#hardware-resources

You can use the free space for hosting open-source text embeddings models such as BAAI/bge-large-en, intfloat/e5-large-v2, sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/all-mpnet-base-v2, etc. as OpenAI API compatible embeddings endpoint using the following Python package:
https://github.com/limcheekin/open-text-embeddings

Thanks a lot, much appreciated, it works like a charm! I do have one little issue however... I get this weird output where every new sentence starts with a number. Do you know why that might be the case?
Screenshot 2023-10-02 at 5.14.07 PM.png

I get similar output, not sure why that's the case. Perhaps you need to play around with the prompts or try the original unquantized model weight.

Hi there,

I just enabled (turn on) the embeddings endpoint and go ahead and test it out yourself and highly appreciate if you could share the result here on how does it compare to other open-source text embeddings models such as BAAI/bge-large-en, intfloat/e5-large-v2, sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/all-mpnet-base-v2, etc.

By the way, I just created the same endpoints for Mistral-7B-OpenOrca-GGUF model at https://huggingface.co/spaces/limcheekin/Mistral-7B-OpenOrca-GGUF.

Hi there,

I deployed the model as OpenAI API compatible endpoint at https://huggingface.co/spaces/limcheekin/Mistral-7B-Instruct-v0.1-GGUF.

Also, I created a jupyter notebook to get you started to use the API endpoint in no time.

Lastly, if you find this resource valuable, your support in the form of starring the space would be greatly appreciated.

Thank you.

Use vLLM : https://github.com/vllm-project/vllm

Hi there,

I deployed the model as OpenAI API compatible endpoint at https://huggingface.co/spaces/limcheekin/Mistral-7B-Instruct-v0.1-GGUF.

Also, I created a jupyter notebook to get you started to use the API endpoint in no time.

Lastly, if you find this resource valuable, your support in the form of starring the space would be greatly appreciated.

Thank you.

Use vLLM : https://github.com/vllm-project/vllm

Thanks for sharing. Is the vllm support GGUF models?

Hi there,

I deployed the model as OpenAI API compatible endpoint at https://huggingface.co/spaces/limcheekin/Mistral-7B-Instruct-v0.1-GGUF.

Also, I created a jupyter notebook to get you started to use the API endpoint in no time.

Lastly, if you find this resource valuable, your support in the form of starring the space would be greatly appreciated.

Thank you.

Use vLLM : https://github.com/vllm-project/vllm

Thanks for sharing. Is the vllm support GGUF models?

Not sure. What not tryout ?

python -m vllm.entrypoints.openai.api_server --model=

I haven't used GGUF on vLLMs before

Hi there,

I deployed the model as OpenAI API compatible endpoint at https://huggingface.co/spaces/limcheekin/Mistral-7B-Instruct-v0.1-GGUF.

Also, I created a jupyter notebook to get you started to use the API endpoint in no time.

Lastly, if you find this resource valuable, your support in the form of starring the space would be greatly appreciated.

Thank you.

Use vLLM : https://github.com/vllm-project/vllm

Thanks for sharing. Is the vllm support GGUF models?

Not sure. What not tryout ?

python -m vllm.entrypoints.openai.api_server --model=

I haven't used GGUF on vLLMs before

I am very much focus on using GGUF models and will pass it for now.

Sign up or log in to comment