OpenAssistant/oasst-sft-6-llama-30b deployment config

#177
by MazH24 - opened

I'm trying to deploy the app locally using the UI and the inference server. Both are running and I'm able to chat with the model, but the local version's responses are inconsistent with HuggingChat, I suspect that it's because of the difference between the initial prompt and the generation parameters.
I tried to find the initial prompt and parameters' config but I only found the ones for "OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5", does anyone know where to find them?

Hugging Chat org
โ€ข
edited Jun 1, 2023

Unfortunately they don't work the same as HuggingChat, since they are another model's config.

Hugging Chat org

The OpenAssistant/oasst-sft-6-llama-30b model is on an endpoint we use only for the hosted version of hugging chat and which is not public, that's why it's not documented in the .env ๐Ÿ˜Š

If you want to spin up your own models that we don't currently make available publicly you can have a look at: https://github.com/huggingface/text-generation-inference it's the same backend that we use for HuggingChat.

@nsarrazin
I'm actually using the text-generation-inference server in combination with chat-ui. For the OpenAssistant/oasst-sft-6-llama-30b model I a manged to deploy the model on 4 gpus, and I'm using the following config set:
{
"name": "OpenAssistant/oasst-sft-6-llama-30b-xor",
"endpoints": [{"url": "http://127.0.0.1:8080/generate_stream"}],
"userMessageToken": "<|prompter|>",
"assistantMessageToken": "<|assistant|>",
"messageEndToken": "<|endoftext|>",
"preprompt": "Below are a series of dialogues between various people and an AI assistant. The AI tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble-but-knowledgeable. The assistant is happy to help with almost anything, and will do its best to understand exactly what is needed. It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer. That said, the assistant is practical and really does its best, and doesn't let caution get too much in the way of being useful.\n-----\n",
],
"parameters": {
"temperature": 0.2,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 1024
}}

But unfortunately I'm not able to get the same results that I get from the official open assistant ui and hugging chat. I understand that I can't get exactly the same results, due to the temperature param, but the problem is that I'm not able to get the same level of reasoning and I suspect that the problem is with my config set.

I understand that your hosted version is not public, but by any chance is it possible for you to share the config set for the model? :)

Hugging Chat org

Oh my bad, I misread your top comment sorry. Our config set looks pretty much the same except for the messageEndToken and the parameters

    "messageEndToken": "</s>", 
    "parameters": {
      "temperature": 0.9,
      "top_p": 0.95,
      "repetition_penalty": 1.2,
      "top_k": 50,
      "truncate": 1000,
      "max_new_tokens": 1024,
      "stop":["</s>"]
    },

Let me know if that helps ?

yeah it worked perfectly, thank you!

MazH24 changed discussion status to closed

Sign up or log in to comment