Generated text is garbled?
I'm calling a dedicated inference endpoint.
In Postman I'm passing the following JSON:
{
"inputs": "Who is Elon Musk?"
}
But the output is garbled. What am I doing wrong? Is the input not meant to be inputs
?
Output:
[
{
"generated_text": "Who is Elon Musk?Edited by Kelly Bowen\nDiscover the incredible story of the man behind SpaceX, the Boring Company and Tesla. How did he become a billionaire? And, how did he build from being a self-taught coder in his mother’s house to becoming the most famous entrepreneur on the planet? \n1. Elon Musk, college dropout-turned-billionaireTesla, The Boring Company, and SpaceX about jessicaderosa about rahuldesai about magicboutique\n2."
}
]
Or
[
{
"generated_text": "Who is Elon Musk? Or, as this quiz is framed: ‘Who is Musk?’ For anyone who didn’t know the answer, though, good luck. As brilliant as Musk may be, we wouldn’t say the man is a genius. The quiz posits Musk as the man behind Tesla, SpaceX, Paypal, Hyperloop, SolarCity. Just to name the biggest.\nSomeone else was/is behind PayPal – Jordan River and Jeremy Stoppelman. And the Hyperloop, which hasn’t existed in anything but theoretical form"
}
]
This is not a instruction finetuned model.
Even my output has been like that, not entire justified, but this improves drastically while using the same on ollama, not sure why.
it is a random model !!
They cut it off from the larger model , hence total madness !!!
it needs fine tuning ??
i dont know why they have done this ... but perhaps its to conseal the data inside ! as a larger company they have issue with copyrites!!!
but also they are the system which is actually a part of the gov so they CANNOT release a fine tuned model ! only a model in Raw mode.... (hence training .... its semi pretrained ie on the text generation task) .... random data ... pushing in all and anything hence your output ...
if you flip it to instruct training in colab etc you can install the correct chat template (take from code llama) ... and train as you would normally for code llama.... you may even find the model not loading correctly !!!! (NOT ALL THE TENSORS ARE TRAINED)... so it must be loaded in Full precision to train ... or load and save pretrained !... Sealing the tensors !: enabling for the reloaded model to be quantized in memery for training with a peft !
Sorry bro ! ::
But for larger companies i would expect this to be thier output even as you see highly guardrailled for specific data output and input ... even rewriting the input to fit thier needs ! - so uncensoring the model with some over fitting and remerging into the base again will also remove some of this stuff !!!
open source is not supposed to be the scraps brother !
Hence even mistral all of a sudden gating their repo! and elon releasing unusable model !! so if you have the moeny and technology you can run the model !!!! ie for the rich only!
But again you can extract the last layers of the model ... and make any B you want ! so open source means work for your lunch (when it comes to the corperations) even microsoft removed thier models all of a sudden!
Even my output has been like that, not entire justified, but this improves drastically while using the same on ollama, not sure why.
if you are not sure why, it probably means some underground magic is going on?
It's just that by default you are probably running greedy generations, make sur to adapt the generation_config
arguments to fit your needs:
- beam sampling
- temperature
- top-k
- top-p
@LeroyDyer no, the reason why they do that is for anyone to be able to finetune the model a lot more easily. The instruct model is one version of this model, finetuned.