File size: 2,851 Bytes
36151b0
 
d7f80ce
 
 
 
 
 
 
 
 
 
d176b74
d7f80ce
36151b0
aefef28
6b5d591
d7f80ce
6b5d591
d7f80ce
aefef28
be723a9
d176b74
 
 
 
 
 
 
 
f877f67
 
d176b74
 
 
973f6da
d176b74
f877f67
612fc9b
 
f877f67
612fc9b
d176b74
 
 
 
 
 
 
612fc9b
 
 
 
 
 
 
d176b74
 
79aa539
d176b74
 
a939888
7db6daa
 
a939888
 
 
 
 
 
 
 
 
7db6daa
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: mit
datasets:
- numind/NuNER
library_name: gliner
language:
- en
pipeline_tag: token-classification
tags:
- entity recognition
- NER
- named entity recognition
- zero shot
- zero-shot
---

NuNER Zero-span is the span-prediction version of [NuNER Zero](https://huggingface.co/numind/NuNER_Zero/edit/main/README.md).

NuNER Zero-span shows slightly better performance than NuNER Zero but cannot detect entities that are larger than 12 tokens.

<p align="center">
<img src="zero_shot_performance_span.png" width="600">
</p>

## Installation & Usage

```
!pip install gliner
```

**NuZero requires labels to be lower-cased**

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("numind/NuNerZero_span")

# NuZero requires labels to be lower-cased!
labels = ["organization", "initiative", "project"]
labels = [l.lower() for l in labels]

text = "At the annual technology summit, the keynote address was delivered by a senior member of the Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory, which recently launched an expansive initiative titled 'Quantum Computing and Algorithmic Innovations: Shaping the Future of Technology'. This initiative explores the implications of quantum mechanics on next-generation computing and algorithm design and is part of a broader effort that includes the 'Global Computational Science Advancement Project'. The latter focuses on enhancing computational methodologies across scientific disciplines, aiming to set new benchmarks in computational efficiency and accuracy."

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])
```

```
Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory => organization
Quantum Computing and Algorithmic Innovations: Shaping the Future of Technology => initiative
Global Computational Science Advancement Project => project
```


## Fine-tuning

A fine-tuning script can be found [here](https://colab.research.google.com/drive/1fu15tWCi0SiQBBelwB-dUZDZu0RVfx_a?usp=sharing).


## Citation
### This work
```bibtex
@misc{bogdanov2024nuner,
      title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data}, 
      author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard},
      year={2024},
      eprint={2402.15343},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```
### Previous work
```bibtex
@misc{zaratiana2023gliner,
      title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer}, 
      author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
      year={2023},
      eprint={2311.08526},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```