Expecting bi-modal disstribution of probabilities

#9
by christianclough - opened

Hello, thanks for the great paper, and publishing the model here!

I'm getting some unusual results: I'm computing large numbers of masked-token probabilities on human DNA for Nucleotide Transformer, DNABERT and DNABERT 2. I see decent bi-modal probability distributions (group of high and low) for all the models except for DNABERT2, which is mono-modal of low probabilities.

Is inference via HuggingFace definitely working for everyone? Note I'm using the model in Google Colab.

Sign up or log in to comment