🚩 Report: Ethical issue(s)

#11
by AIdinner - opened

This model is claimed to be trained on LLaMA-2-70B but "upstage" is found in its configuration, which is really strange.
ζˆͺ屏2023-09-14 21.24.32.png
ζˆͺ屏2023-09-14 21.25.23.png

Sorry for late reply, a little busy recently.
Here is the thing, I will explain for your.
First of all, as LoRA is an add-up weights for base model, it can be merged to any model with the same architecture. (which is more commonly used for text-toimage application like stable-diffusion).

Secondly, this model was truly finetuned based on LLaMA-2-70B, but merged with upstage/llama2-70b-instruct-v2. it's a deploy strategy introduced by some other models. If this mislead something and you think relates to some ethical issue, we are really sorry for that, and we will explained why these strategy taked.

Thirdly, you could try directly finetuned on upstage/llama2-70b-instruct-v2 with the mentioned data, scores for the four metric are expected to decrease(I experiment for several times), means it may already biased for the base model and adding up more data will exacerbate the baises.

Fourth, actually we did some other works after this model(mixed up with more data, evaluate on other benchmark), we found that:1) more data add-up finetuning will not bring a huge increase(metamorphosis) for the metric (our best finetuned model get ~75.5 average metric on local evaluation and never increase any more with more data) and more importantly 2) the evaluation scores for humaneval and GSM8k (and others) witness a considerable decrease compared to the original LLaMa2-70b. We notice that the finetuned model is more-or-less metric-oriented biased, so turn our strategy to large-scale dataset mix-up processing and full-parameter pretrainining.

Sign up or log in to comment