akdeniz27 commited on
Commit
5c219e7
·
1 Parent(s): b356f81

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: hu
3
+ widget:
4
+ - text: ""
5
+ ---
6
+ # Turkish Named Entity Recognition (NER) Model
7
+ This model is the fine-tuned model of SZTAKI-HLT/hubert-base-cc
8
+ using the famous WikiANN dataset presented
9
+ in the "Cross-lingual Name Tagging and Linking for 282 Languages" paper.
10
+
11
+ # Fine-tuning parameters:
12
+ ```
13
+ task = "ner"
14
+ model_checkpoint = "SZTAKI-HLT/hubert-base-cc"
15
+ batch_size = 8
16
+ label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
17
+ max_length = 512
18
+ learning_rate = 2e-5
19
+ num_train_epochs = 3
20
+ weight_decay = 0.01
21
+ ```
22
+ # How to use:
23
+ ```
24
+ model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-hungarian-cased-ner")
25
+ tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-hungarian-cased-ner")
26
+ ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
27
+ ner("<your text here>")
28
+
29
+ # Pls refer "https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" for entity grouping with aggregation_strategy parameter.
30
+ ```
31
+ # Reference test results:
32
+ * accuracy: 0.9774538310923768
33
+ * f1: 0.9462099085573904
34
+ * precision: 0.9425718667406271
35
+ * recall: 0.9498761426661113