kiddothe2b's picture
Update README.md
31b062d
metadata
widget:
  - text: >-
      KOMMISSIONENS BESLUTNING

      af 6. marts 2006

      om klassificering af visse byggevarers ydeevne med hensyn til reaktion ved
      brand for  vidt angår trægulve samt vægpaneler og vægbeklædning i
      massivt træ

      (meddelt under nummer K(2006) 655
datasets:
  - multi_eurlex
metrics:
  - accuracy
model-index:
  - name: coastalcph/danish-legal-longformer-eurlex
    results:
      - task:
          type: text-classification
          name: Danish EURLEX (Level 2)
        dataset:
          name: multi_eurlex
          type: multi_eurlex
          config: multi_eurlex
          split: validation
        metrics:
          - name: Micro-F1
            type: micro-f1
            value: 0.75748
          - name: Macro-F1
            type: macro-f1
            value: 0.52883

Model description

This model is a fine-tuned version of coastalcph/danish-legal-longformer-base on the Danish part of MultiEURLEX dataset.

Training and evaluation data

The Danish part of MultiEURLEX dataset.

Use of Model

As a text classifier:

from transformers import pipeline
import numpy as np

# Init text classification pipeline
text_cls_pipe = pipeline(task="text-classification",
                         model="coastalcph/danish-legal-longformer-eurlex",
                         use_auth_token='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')

# Encode and Classify document
predictions = text_cls_pipe("KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers "
                            "ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler "
                            "og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655")

# Print prediction
print(predictions)
# [{'label': 'building and public works', 'score': 0.9626012444496155}]

As a feature extractor (document embedder):

from transformers import pipeline
import numpy as np

# Init feature extraction pipeline
feature_extraction_pipe = pipeline(task="feature-extraction",
                                   model="coastalcph/danish-legal-longformer-eurlex",
                                   use_auth_token='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')

# Encode document
predictions = feature_extraction_pipe("KOMMISSIONENS BESLUTNING\naf 6. marts 2006\nom klassificering af visse byggevarers "
                                      "ydeevne med hensyn til reaktion ved brand for så vidt angår trægulve samt vægpaneler "
                                      "og vægbeklædning i massivt træ\n(meddelt under nummer K(2006) 655")

# Use CLS token representation as document embedding
document_features = token_wise_features[0][0]

print(document_features.shape)
# (768,)

Framework versions

  • Transformers 4.18.0
  • Pytorch 1.12.0+cu113
  • Datasets 2.0.0
  • Tokenizers 0.12.1