kiddothe2b's picture
Update README.md
d100883
---
widget:
- text: "KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655"
datasets:
- multi_eurlex
metrics:
- f1
model-index:
- name: coastalcph/danish-legal-longformer-eurlex-sd
results:
- task:
type: text-classification
name: Danish EURLEX (Level 2)
dataset:
name: multi_eurlex
type: multi_eurlex
config: multi_eurlex
split: validation
metrics:
- name: Micro-F1
type: micro-f1
value: 0.76144
- name: Macro-F1
type: macro-f1
value: 0.52878
---
# Model description
This model is a fine-tuned version of [coastalcph/danish-legal-longformer-base](https://huggingface.co/coastalcph/danish-legal-longformer-base) on the Danish part of [MultiEURLEX](https://huggingface.co/datasets/multi_eurlex) dataset using an additional Spectral Decoupling penalty ([Pezeshki et al., 2020](https://arxiv.org/abs/2011.09468).
## Training and evaluation data
The Danish part of [MultiEURLEX](https://huggingface.co/datasets/multi_eurlex) dataset.
## Use of Model
### As a text classifier:
```python
from transformers import pipeline
import numpy as np
# Init text classification pipeline
text_cls_pipe = pipeline(task="text-classification",
model="coastalcph/danish-legal-longformer-eurlex-sd",
use_auth_token='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')
# Encode and Classify document
predictions = text_cls_pipe("KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers "
"ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler "
"og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655")
# Print prediction
print(predictions)
# [{'label': 'building and public works', 'score': 0.9626012444496155}]
```
### As a feature extractor (document embedder):
```python
from transformers import pipeline
import numpy as np
# Init feature extraction pipeline
feature_extraction_pipe = pipeline(task="feature-extraction",
model="coastalcph/danish-legal-longformer-eurlex-sd",
use_auth_token='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')
# Encode document
predictions = feature_extraction_pipe("KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers "
"ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler "
"og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655")
# Use CLS token representation as document embedding
document_features = token_wise_features[0][0]
print(document_features.shape)
# (768,)
```
## Framework versions
- Transformers 4.18.0
- Pytorch 1.12.0+cu113
- Datasets 2.0.0
- Tokenizers 0.12.1