jaandoui commited on
Commit
65dd5c9
1 Parent(s): cc8cdbf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -21,6 +21,7 @@ Now in ```Trainer``` (or ```CustomTrainer``` if overwritten) in ```compute_loss(
21
  ```outputs = model(**inputs, return_dict=True, output_attentions=True)```
22
  activate the extraction of attention: ```output_attentions=True``` (and ```return_dict=True``` (optional)).
23
  You can now extract the attention in ```outputs.attentions```
 
24
  Read more about model outputs here: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/output#transformers.utils.ModelOutput
25
 
26
  I'm also not using Triton, therefore cannot guarantee that it will work with it.
@@ -28,6 +29,8 @@ I'm also not using Triton, therefore cannot guarantee that it will work with it.
28
  I also read that there were some problems with extracting attention when using Flash Attention here: https://github.com/huggingface/transformers/issues/28903
29
  Not sure if that is relevant for us, since it's about Mistral models.
30
 
 
 
31
  The official link to DNABERT2 [DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
32
  ](https://arxiv.org/pdf/2306.15006.pdf).
33
 
 
21
  ```outputs = model(**inputs, return_dict=True, output_attentions=True)```
22
  activate the extraction of attention: ```output_attentions=True``` (and ```return_dict=True``` (optional)).
23
  You can now extract the attention in ```outputs.attentions```
24
+ Note than the output has a third dimension, mostly of value 12, referring to the layer ```outputs.attentions[-1]``` refers to the attention of the last layer.
25
  Read more about model outputs here: https://huggingface.co/docs/transformers/v4.40.2/en/main_classes/output#transformers.utils.ModelOutput
26
 
27
  I'm also not using Triton, therefore cannot guarantee that it will work with it.
 
29
  I also read that there were some problems with extracting attention when using Flash Attention here: https://github.com/huggingface/transformers/issues/28903
30
  Not sure if that is relevant for us, since it's about Mistral models.
31
 
32
+ I'm still exploring this attention, please don't take it as if it works 100%. I'll update the repository when I'm sure.
33
+
34
  The official link to DNABERT2 [DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
35
  ](https://arxiv.org/pdf/2306.15006.pdf).
36