|
--- |
|
license: mit |
|
--- |
|
This is a NER model meant to be used to detect/extract citations from American legal documents. |
|
|
|
Ignore the widget on the model card page; see below for usage. |
|
|
|
## How to Use the Model |
|
|
|
This model outputs token-level predictions, which should be processed as follows to obtain meaningful labels for each token: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("ss108/legal-citation-bert") |
|
model = AutoModelForTokenClassification.from_pretrained("ss108/legal-citation-bert") |
|
|
|
text = "Your example text here" |
|
inputs = tokenizer(text, return_tensors="pt", padding=True) |
|
outputs = model(**inputs) |
|
|
|
logits = outputs.logits |
|
predictions = torch.argmax(logits, dim=-1) |
|
|
|
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0]) |
|
predicted_labels = [model.config.id2label[p.item()] for p in predictions[0]] |
|
|
|
|
|
components = [] |
|
for token, label in zip(tokens, predicted_labels): |
|
components.append(f"{token} : {label}") |
|
|
|
concat = " ; ".join(components) |
|
print(concat) |
|
|