microsoft/Phi-3-mini-128k-instruct

I can't get over how astonishingly well this 3.8b parameter model sometimes performs.

However, it's not doing well on the lmsys arena, and I'm pretty sure it's because there's an annoying 3rd party built into Phi3.

That is, there's a part of Phi mini that's attempting to just give me the desired answer, just like all other LLMs I've used. However, there's an annoying 3rd party that keeps butting in at the beginning, end, and sometimes the middle, of responses with information that ~99% of the time is entirely unwanted by the user.

I feel like I'm rolling the dice every time a send a prompt to Phi3. Will it just give me the answer, or will it decide it knows better than me what I want? Worse yet, will it decide to lecture me about the proper way to express myself when I simply asked for the definition of a colorful colloquialism that I came across on social media?

It's abundantly clear that when you fine-tuned Phi-3 you weren't putting yourselves in the shoes of potential users, but where instead asking yourselves how the various outputs would make you look on the world's stage.

PS - Toxicity tests are fundamentally flawed. Using ubiquitous curse words, or telling ubiquitous blond jokes, isn't near as toxic as compulsively telling law-abiding decent adults what is the proper way to speak, behave, and show respect for others.

microsoft
/

Phi-3-mini-128k-instruct

Lmsys Chat Arena