File size: 1,566 Bytes
36151b0
 
d7f80ce
 
 
 
 
 
 
 
 
 
d176b74
d7f80ce
36151b0
aefef28
15e0b77
d7f80ce
 
 
aefef28
2369971
d176b74
 
 
 
 
 
 
 
f877f67
 
d176b74
 
 
 
 
f877f67
 
 
d176b74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a939888
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
license: mit
datasets:
- numind/NuNER
library_name: gliner
language:
- en
pipeline_tag: token-classification
tags:
- entity recognition
- NER
- named entity recognition
- zero shot
- zero-shot
---

NuZero - is the family of Zero-Shot Entity Recognition models inspired by [GLiNER](https://huggingface.co/papers/2311.08526) and built with insights we gathered throughout our work on [NuNER](https://huggingface.co/collections/numind/nuner-token-classification-and-ner-backbones-65e1f6e14639e2a465af823b).

NuZero span is a more powerful version of GLiNER-large-v2.1, surpassing it by 4% on average, and is trained on the diverse internal dataset tailored for real-life use cases.

<p align="center">
<img src="zero_shot_performance_span.png">
</p>

## Installation & Usage

```
!pip install gliner
```

**NuZero requires labels to be lower-cased**

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("numind/NuZero_span")

# NuZero requires labels to be lower-cased!
labels = ["person", "award", "date", "competitions", "teams"]

text = """

"""

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])
```

## Fine-tuning




## Citation
```
@misc{bogdanov2024nuner,
      title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data}, 
      author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard},
      year={2024},
      eprint={2402.15343},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```