Taemin Lee commited on
Commit
7e1ac1e
β€’
1 Parent(s): e925ea4
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +96 -1
  3. gliner_config.json +23 -0
  4. pytorch_model.bin +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ pytorch_model.bin filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,98 @@
1
  ---
2
- license: cc-by-nc-3.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - korean
5
+ pipeline_tag: token-classification
6
+ library_name: gliner
7
  ---
8
+
9
+
10
+ # Model Card for GLiNER-ko
11
+
12
+ GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
13
+
14
+ This version has been trained on the **various Korean NER** dataset (Research purpose). Commercially permission versions are available (**urchade/gliner_smallv2**, **urchade/gliner_mediumv2**, **urchade/gliner_largev2**)
15
+
16
+ ## Links
17
+
18
+ * Paper: https://arxiv.org/abs/2311.08526
19
+ * Repository: https://github.com/urchade/GLiNER
20
+
21
+ ## Installation
22
+ To use this model, you must install the Korean fork of GLiNER Python library and mecab-ko:
23
+ ```
24
+ !pip install git+https://github.com/taeminlee/GLiNER
25
+ !pip install python-mecab-ko
26
+ ```
27
+
28
+ ## Usage
29
+ Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using `GLiNER.from_pretrained` and predict entities with `predict_entities`.
30
+
31
+ ```python
32
+ from gliner import GLiNER
33
+
34
+ model = GLiNER.from_pretrained("taeminlee/gliner_ko")
35
+
36
+ text = """
37
+ ν”Όν„° 잭슨 κ²½(, 1961λ…„ 10μ›” 31일 ~ )은 λ‰΄μ§ˆλžœλ“œμ˜ μ˜ν™” 감독, 각본가, μ˜ν™” ν”„λ‘œλ“€μ„œμ΄λ‹€. J. R. R. ν†¨ν‚¨μ˜ μ†Œμ„€μ„ μ›μž‘μœΌλ‘œ ν•œ γ€Šλ°˜μ§€μ˜ μ œμ™• μ˜ν™” 3λΆ€μž‘γ€‹(2001λ…„~2003λ…„)의 κ°λ…μœΌλ‘œ κ°€μž₯ 유λͺ…ν•˜λ‹€. 2005λ…„μ—λŠ” 1933λ…„μž‘ ν‚Ήμ½©μ˜ λ¦¬λ©”μ΄ν¬μž‘ γ€Šν‚Ήμ½©(2005)γ€‹μ˜ 감독을 λ§‘μ•˜λ‹€.
38
+ """
39
+
40
+ tta_labels = ["ARTIFACTS", "ANIMAL", "CIVILIZATION", "DATE", "EVENT", "STUDY_FIELD", "LOCATION", "MATERIAL", "ORGANIZATION", "PERSON", "PLANT", "QUANTITY", "TIME", "TERM", "THEORY"]
41
+
42
+ entities = model.predict_entities(text, labels)
43
+
44
+ for entity in entities:
45
+ print(entity["text"], "=>", entity["label"])
46
+ ```
47
+
48
+ ```
49
+ ν”Όν„° 잭슨 κ²½ => PERSON
50
+ 1961λ…„ 10μ›” 31일 ~ => DATE
51
+ λ‰΄μ§ˆλžœλ“œ => LOCATION
52
+ μ˜ν™” 감독 => OCCUPATION
53
+ 각본가 => OCCUPATION
54
+ μ˜ν™” => OCCUPATION
55
+ ν”„λ‘œλ“€μ„œ => OCCUPATION
56
+ J. R. R. 톨킨 => PERSON
57
+ 3λΆ€μž‘ => QUANTITY
58
+ 2001λ…„~2003λ…„ => DATE
59
+ 감독 => OCCUPATION
60
+ 2005λ…„ => DATE
61
+ 1933λ…„μž‘ => DATE
62
+ 킹콩 => ARTIFACTS
63
+ 킹콩 => ARTIFACTS
64
+ 2005 => DATE
65
+ 감독 => OCCUPATION
66
+ ```
67
+
68
+
69
+ ## Named Entity Recognition benchmark result
70
+
71
+ Evaluate with the [konne dev set](https://github.com/korean-named-entity/konne)
72
+
73
+ | Model | Precision (P) | Recall (R) | F1 |
74
+ |------------------|-----------|-----------|--------|
75
+ | Gliner-ko (t=0.5) | **72.51%** | **79.82%** | **75.99%** |
76
+ | Gliner Large-v2 (t=0.5) | 34.33% | 19.50% | 24.87% |
77
+ | Gliner Multi (t=0.5) | 40.94% | 34.18% | 37.26% |
78
+ | Pororo | 70.25% | 57.94% | 63.50% |
79
+
80
+ ## Model Authors
81
+ The model authors are:
82
+ * [Taemin Lee](http://tmkor.com)
83
+ * [Urchade Zaratiana](https://huggingface.co/urchade)
84
+ * Nadi Tomeh
85
+ * Pierre Holat
86
+ * Thierry Charnois
87
+
88
+ ## Citation
89
+ ```bibtex
90
+ @misc{zaratiana2023gliner,
91
+ title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
92
+ author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
93
+ year={2023},
94
+ eprint={2311.08526},
95
+ archivePrefix={arXiv},
96
+ primaryClass={cs.CL}
97
+ }
98
+ ```
gliner_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "lr_encoder": "1e-5",
3
+ "lr_others": "5e-5",
4
+ "warmup_ratio": 0.1,
5
+ "max_width": 12,
6
+ "model_name": "lighthouse/mdeberta-v3-base-kor-further",
7
+ "fine_tune": true,
8
+ "subtoken_pooling": "first",
9
+ "hidden_size": 768,
10
+ "span_mode": "markerV0",
11
+ "dropout": 0.4,
12
+ "root_dir": "ablation_backbone",
13
+ "prev_path": "none",
14
+ "size_sup": -1,
15
+ "max_types": 25,
16
+ "shuffle_types": true,
17
+ "random_drop": true,
18
+ "max_neg_type_ratio": 1,
19
+ "max_len": 384,
20
+ "name": "largev2_ko_m",
21
+ "log_dir": "logs",
22
+ "tokenizer": "mecab-ko"
23
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9263ace777fa9929306d62f1c81378556657f5f3a9fcd2a6ceb6ba7e3e8a636
3
+ size 1209405350