NorbertRop commited on
Commit
d81d813
1 Parent(s): 8b4567f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: pl
3
+ license: mit
4
+ tags:
5
+ - ner
6
+ datasets:
7
+ - clarin-pl/kpwr-ner
8
+ metrics:
9
+ - f1
10
+ - accuracy
11
+ - precision
12
+ - recall
13
+ widget:
14
+ - text: "Nazywam się Jan Kowalski i mieszkam we Wrocławiu."
15
+ example_title: "Example"
16
+ ---
17
+
18
+ # FastPDN
19
+
20
+ FastPolDeepNer is model for Named Entity Recognition, designed for easy use, training and configuration. The forerunner of this project is [PolDeepNer2](https://gitlab.clarin-pl.eu/information-extraction/poldeepner2). The model implements a pipeline consisting of data processing and training using: hydra, pytorch, pytorch-lightning, transformers.
21
+
22
+ Source code: https://gitlab.clarin-pl.eu/grupa-wieszcz/ner/fast-pdn
23
+
24
+ ## How to use
25
+
26
+ Here is how to use this model to get Named Entities in text:
27
+
28
+ ```python
29
+ from transformers import pipeline
30
+ ner = pipeline('ner', model='clarin-pl/FastPDN', aggregation_strategy='simple')
31
+
32
+ text = "Nazywam się Jan Kowalski i mieszkam we Wrocławiu."
33
+ ner_results = ner(text)
34
+ for output in ner_results:
35
+ print(output)
36
+
37
+ {'entity_group': 'nam_liv_person', 'score': 0.9996054, 'word': 'Jan Kowalski', 'start': 12, 'end': 24}
38
+ {'entity_group': 'nam_loc_gpe_city', 'score': 0.998931, 'word': 'Wrocławiu', 'start': 39, 'end': 48}
39
+ ```
40
+
41
+ Here is how to use this model to get the logits for every token in text:
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
45
+
46
+ tokenizer = AutoTokenizer.from_pretrained("clarin-pl/FastPDN")
47
+ model = AutoModelForTokenClassification.from_pretrained("clarin-pl/FastPDN")
48
+
49
+ text = "Nazywam się Jan Kowalski i mieszkam we Wrocławiu."
50
+ encoded_input = tokenizer(text, return_tensors='pt')
51
+ output = model(**encoded_input)
52
+ ```
53
+
54
+ ## Training data
55
+ The FastPDN model was trained on datasets (with 82 class versions) of kpwr and cen. Annotation guidelines are specified [here](https://clarin-pl.eu/dspace/bitstream/handle/11321/294/WytyczneKPWr-jednostkiidentyfikacyjne.pdf).
56
+
57
+ ## Pretraining
58
+ FastPDN models have been fine-tuned, thanks to pretrained models:
59
+ - [herbert-base-case](https://huggingface.co/allegro/herbert-base-cased)
60
+ - [distiluse-base-multilingual-cased-v1](sentence-transformers/distiluse-base-multilingual-cased-v1)
61
+ ## Evaluation
62
+
63
+ Runs trained on `cen_n82` and `kpwr_n82`:
64
+ | name |test/f1|test/pdn2_f1|test/acc|test/precision|test/recall|
65
+ |---------|-------|------------|--------|--------------|-----------|
66
+ |distiluse| 0.53 | 0.61 | 0.95 | 0.55 | 0.54 |
67
+ | herbert | 0.68 | 0.78 | 0.97 | 0.7 | 0.69 |
68
+
69
+
70
+ ## Authors
71
+
72
+ - Grupa Wieszcze CLARIN-PL
73
+ - Wiktor Walentynowicz
74
+
75
+ ## Contact
76
+
77
+ - Norbert Ropiak (norbert.ropiak@pwr.edu.pl)