jaandoui
/

DNABERT2-AttentionExtracted

Inference Endpoints

Model card Files Files and versions Community

jaandoui commited on May 16, 2024

Commit

65dd5c9

·

verified ·

1 Parent(s): cc8cdbf

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -21,6 +21,7 @@ Now in ```Trainer``` (or ```CustomTrainer``` if overwritten) in ```compute_loss(
         ```outputs = model(**inputs, return_dict=True, output_attentions=True)```
 activate the extraction of attention: ```output_attentions=True``` (and ```return_dict=True``` (optional)).
 You can now extract the attention in ```outputs.attentions```
 Read more about model outputs here: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/output#transformers.utils.ModelOutput
 I'm also not using Triton, therefore cannot guarantee that it will work with it.
@@ -28,6 +29,8 @@ I'm also not using Triton, therefore cannot guarantee that it will work with it.
 I also read that there were some problems with extracting attention when using Flash Attention here: https://github.com/huggingface/transformers/issues/28903
 Not sure if that is relevant for us, since it's about Mistral models.
 The official link to DNABERT2 [DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
 ](https://arxiv.org/pdf/2306.15006.pdf).

         ```outputs = model(**inputs, return_dict=True, output_attentions=True)```
 activate the extraction of attention: ```output_attentions=True``` (and ```return_dict=True``` (optional)).
 You can now extract the attention in ```outputs.attentions```
+Note than the output has a third dimension, mostly of value 12, referring to the layer ```outputs.attentions[-1]``` refers to the attention of the last layer.
 Read more about model outputs here: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/output#transformers.utils.ModelOutput
 I'm also not using Triton, therefore cannot guarantee that it will work with it.
 I also read that there were some problems with extracting attention when using Flash Attention here: https://github.com/huggingface/transformers/issues/28903
 Not sure if that is relevant for us, since it's about Mistral models.
+I'm still exploring this attention, please don't take it as if it works 100%. I'll update the repository when I'm sure.
 The official link to DNABERT2 [DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
 ](https://arxiv.org/pdf/2306.15006.pdf).