Chat templates and "censored" performance

#3
by nfunctor - opened

Hello and thank you for your work ! If you don’t mind answering these questions :

  1. It appears that the tokenizer has an empty chat template. What should it be, Llama’s or im_chat? I actually tested both and so far I see adequate performance, but I wonder what you think.
  2. For "safe content" uses, would you advise this model or NeuralDaredevil ?

The last question is maybe a skill issue on my end but still :
3. Currently the model is stored in a way that takes a lot of space (I would believe it has to do with the fact that it’s a merge ?). Do you plan to repackage safetensors ? Relaunching the model is a bit of a wait on some setups including mine.

Thanks again !

Owner

Hey thanks for your interest.

  1. I just updated it, it uses the Llama 3 chat template

  2. I advise this model if this is not a concern

  3. Ah thanks for noticing. Actually, I converted it back to BF16, but I forgot to remove the previous FP32 weights. This is done now, the new weights are a lot lighter.

Thanks for getting back ! Just to be sure,

  1. Does this template update impact the published benchmarks in any way ? I haven't looked into how benchmarks are done in the detail but I'd imagine you can Q&A in the form of a conversation.
  2. Ok, will re-download! I haven't noticed that there are BF16 weights before, should one be also loading in BF16 for best inference or one can keep FP16 as on the model card ? I'd imagine the difference between the two at least for the inference...
Owner
  1. The chat template is not used in these benchmarks so that should be safe.

  2. Yeah, it's better to load in BF16 if possible. I updated the example in the model card to reflect that

Hi,

Just wanted to mention that I also had to manually set a pad token when trying batched inference. Maybe you can update it to be set automatically ?

Also, it seems like all the things discussed here (weights bf16, chat template, tokens) apply to the abliterated model(s). I’m simply using the daredevil tokeniser for neuraldaredevil abliterated but it would be nice to have a fix there as well. Thanks !

Owner

Manually setting the pad token sounds standard to me, Meta's generation_config.json doesn't set one either.

I've updated the generation_config.jsonfor the merge and abliterated version. I've also fixed the tokenizer_config.json for the abliterated version. I think the weights and the chat templates are correct. Let me know if I missed something.

Yes, sorry, my oversight about the pad token as well as I got somewhat confused in the namings above. It was the ordinary daredevil-abliterated that lacked template. Now seems good.

Both the ordinary and the neural abliterated models seem to have incorrect _name_or_path but I don't see much else (if the weights for Neural abliterated are indeed float16 and not bfloat16 like for the rest of the family).

Sign up or log in to comment