GPT Like and Very Good Reasoning
Very good but concerning resource usage it has slow inference of up to 47 seconds. This smaller model would be worthy of a mixture of experts model fine tuned with a system message it would be close to GPT 3.5.
This is one of my older models, and does not perform as well as my newer models. You can check out https://huggingface.co/M4-ai/NeuralReyna-Mini-1.8B-v0.2 which is of similar size and performs excellently. It's one of the best models available that is under 2 billion parameters. It should also have faster inference because it uses optimizations such as flash attention and sliding windows.
That model appears to be set-up for Text Completion I also tried
<|USER|> Craft me a list of some nice places to visit around the world. <|ASSISTANT|>
and no text was generated.
If it uses the conversational template, a recent update broke it for now.
How would you recommend prompting the model?
Is it for a pipeline where you start the answer and let the model complete it?
On second test it appears to work as long as their is a line break after the user message
<|USER|> Craft me a list of some nice places to visit around the world.
<|ASSISTANT|> Some cool places you might want to go:
- Paris, France
- Tokyo, Japan
The model that I mentioned earlier is trained on the ChatML prompt format. If you want to get a proper response, you can format it like this:
<|im_start|>user
Craft me a list of some nice places to visit around the world.<|im_end|>
<|im_start|>assistant
Thanks for your time! Now is there any way to inject a system message or chat history for extra context?
<|im_start|>user
Craft me a list of some nice places to visit around the world.<|im_end|>
<|im_start|>assistant
This model is indeed trained to take in system messages. If you want to add a system message, you can do something like this:
<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Craft me a list of some nice places to visit around the world.<|im_end|>
<|im_start|>assistant
I have this agent framework that engineers the LLMs system message with text vision, memories, time and identity. Might try to experiment with that to see how self-aware the model can be. But I was wondering what is the maximum context window for this model?
The maximum context window is 1600 tokens.