Really??? Unbelievable

#2
by Qinghan - opened

if True:
print('NB plus')

@Qinghan Do you know anything about this team? Cannot find any introduction of the Xwin-LM team

@Mlemoyne https://github.com/REIGN12

Should be a collab between Tsinghua University and Microsoft Asia Research

This is too explosive. While all other models are still catching up to GPT-3.5.

Its performance is questionable, often deviating from the question. Without considering the questions, the answer sometimes appears perfect, almost textbook. However, it doesn't seem to truly understand the questions like other models of the 70B level (or ChatGPT/Claude). Sometimes when you ask it a question, it responds by creating usernames in an online forum style and engages in self-questions and answers for several rounds.
I don't think it is as good as gpt4.
but, 这货中文不错

@hiarcs Wow! Could you tell me some questions that it gives bad responses? I will try it. I just wonder, if it is like that, how did it beat ChatGPT-4 on the benchmark?

Its performance is questionable, often deviating from the question. Without considering the questions, the answer sometimes appears perfect, almost textbook. However, it doesn't seem to truly understand the questions like other models of the 70B level (or ChatGPT/Claude). Sometimes when you ask it a question, it responds by creating usernames in an online forum style and engages in self-questions and answers for several rounds.
I don't think it is as good as gpt4.
but, 这货中文不错

Would you show examples Q&A please?

@Mlemoyne @Yhyu13 ,I can't accurately recall all the details, but there were 2 questions roughly as follows:

  1. How to create a microservice project using Micronaut, make it accept POST requests, pass the commands included in that request to a local process, retrieve and forward the std-out to a 3rd-party service using POST? When answering this question, there was a situation of self-questioning and self-answering, and both the question and answer started with "@someusername," and "they" were talking about Spring Framework, not micronaut.
  2. How to love? This question was answered very well, until something like "in this book..." appeared. The answer, when viewed in isolation, was perfect, but it was clearly not appropriate.
    My intuitive feeling is that it is trying to play a certain role, or it erroneously thinks it is in some virtual scenario rather than in a normal conversation.

And yes, it is so skilled at roleplaying.

I have to point out that according to the model card, it has a win rate of over 60% against GPT-4, which is commendable. However, in the same table, LLaMa2 70B has also defeated or at least tied with ChatGPT-4!!?
And LLaMa2 13B seems to be significantly better than ChatGPT. For reference, in Hugging Face's rankings, this model appears to be an ordinary 70B model. So, I think it might be necessary to examine the evaluation criteria.

@hiarcs @Qinghan @wawoshashi @Yhyu13 Do you guys have any wechat group in which we can talk about LLMs?

@hiarcs I tried both of those questions using Q5_K_M and got effectively decent responses. Neither of your issues appeared in the responses. This was via the text-gen-webui using the Yara defaults. Do you know what parameters you were using?

Sign up or log in to comment