readme
Browse files
README.md
CHANGED
@@ -55,13 +55,14 @@ With smaller block size (lower ressources):
|
|
55 |
|
56 |
| Length | Sparse Type | Block Size | Sparsity | Connexions | R1 | R2 | RL | RLsum |
|
57 |
|:------ |:------------ |:---------- |:-------- | :--------- |:----- |:----- |:----- |:----- |
|
|
|
|
|
58 |
| 4096 | Pooling | 32 | 4 | 160 | 44.60 | 19.35 | 26.83 | 40.85 |
|
59 |
| 4096 | Stride | 32 | 4 | 160 | 45.52 | 20.07 | 27.39 | 41.75 |
|
60 |
| 4096 | Block Stride | 32 | 4 | 160 | 45.30 | 19.89 | 27.22 | 41.54 |
|
61 |
| 4096 | Norm | 32 | 4 | 160 | 44.30 | 19.05 | 26.57 | 40.47 |
|
62 |
| 4096 | LSH | 32 | 4 | 160 | 44.53 | 19.27 | 26.84 | 40.74 |
|
63 |
|
64 |
-
|
65 |
## Model description
|
66 |
The model relies on Local-Sparse-Global attention to handle long sequences:
|
67 |
![attn](attn.png)
|
|
|
55 |
|
56 |
| Length | Sparse Type | Block Size | Sparsity | Connexions | R1 | R2 | RL | RLsum |
|
57 |
|:------ |:------------ |:---------- |:-------- | :--------- |:----- |:----- |:----- |:----- |
|
58 |
+
| 4096 | Local | 64 | 0 | 192 | 45.74 | 20.26 | 27.51 | 41.99 |
|
59 |
+
| 4096 | Local | 32 | 0 | 96 | 42.69 | 17.83 | 25.62 | 38.89 |
|
60 |
| 4096 | Pooling | 32 | 4 | 160 | 44.60 | 19.35 | 26.83 | 40.85 |
|
61 |
| 4096 | Stride | 32 | 4 | 160 | 45.52 | 20.07 | 27.39 | 41.75 |
|
62 |
| 4096 | Block Stride | 32 | 4 | 160 | 45.30 | 19.89 | 27.22 | 41.54 |
|
63 |
| 4096 | Norm | 32 | 4 | 160 | 44.30 | 19.05 | 26.57 | 40.47 |
|
64 |
| 4096 | LSH | 32 | 4 | 160 | 44.53 | 19.27 | 26.84 | 40.74 |
|
65 |
|
|
|
66 |
## Model description
|
67 |
The model relies on Local-Sparse-Global attention to handle long sequences:
|
68 |
![attn](attn.png)
|