Serega6678 commited on
Commit
637d6b7
1 Parent(s): 2ca3de5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md CHANGED
@@ -1,3 +1,79 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - numind/NuNER
5
+ library_name: gliner
6
+ language:
7
+ - en
8
+ pipeline_tag: token-classification
9
+ tags:
10
+ - entity recognition
11
+ - NER
12
+ - named entity recognition
13
+ - zero shot
14
+ - zero-shot
15
  ---
16
+
17
+ NuZero - is the family of Zero-Shot Entity Recognition models inspired by [GLiNER](https://huggingface.co/papers/2311.08526) and built with insights we gathered throughout our work on [NuNER](https://huggingface.co/collections/numind/nuner-token-classification-and-ner-backbones-65e1f6e14639e2a465af823b).
18
+
19
+ The key difference between NuZero Token in comparison to GLiNER is the possibility to **detect entities that are longer than 12 tokens**, as NuZero Token operates on the token lever rather than on the span level. Also, NuZero token is 1% more intelligent on average.
20
+
21
+ <p align="center">
22
+ <img src="zero_shot_performance_unzero_token.png">
23
+ </p>
24
+
25
+ ## Installation & Usage
26
+
27
+ ```
28
+ !pip install gliner
29
+ ```
30
+
31
+ **NuZero requires labels to be lower-cased**
32
+
33
+ ```python
34
+ from gliner import GLiNER
35
+
36
+ model = GLiNER.from_pretrained("numind/NuZero_span")
37
+
38
+ # NuZero requires labels to be lower-cased!
39
+ labels = ["person", "award", "date", "competitions", "teams"]
40
+ labels [l.lower() for l in labels]
41
+
42
+ text = """
43
+
44
+ """
45
+
46
+ entities = model.predict_entities(text, labels)
47
+
48
+ for entity in entities:
49
+ print(entity["text"], "=>", entity["label"])
50
+ ```
51
+
52
+ ## Fine-tuning
53
+
54
+
55
+
56
+
57
+ ## Citation
58
+ ### This work
59
+ ```bibtex
60
+ @misc{bogdanov2024nuner,
61
+ title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data},
62
+ author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard},
63
+ year={2024},
64
+ eprint={2402.15343},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.CL}
67
+ }
68
+ ```
69
+ ### Previous work
70
+ ```bibtex
71
+ @misc{zaratiana2023gliner,
72
+ title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
73
+ author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
74
+ year={2023},
75
+ eprint={2311.08526},
76
+ archivePrefix={arXiv},
77
+ primaryClass={cs.CL}
78
+ }
79
+ ```