julien-c HF staff commited on
Commit
3a0e2b4
1 Parent(s): f761400

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/SZTAKI-HLT/hubert-base-cc/README.md

Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: hu
3
+ license: apache-2.0
4
+ datasets:
5
+ - common_crawl
6
+ - wikipedia
7
+ ---
8
+
9
+ # huBERT base model (cased)
10
+
11
+ ## Model description
12
+
13
+ Cased BERT model for Hungarian, trained on the (filtered, deduplicated) Hungarian subset of the Common Crawl and a snapshot of the Hungarian Wikipedia.
14
+
15
+ ## Intended uses & limitations
16
+
17
+ The model can be used as any other (cased) BERT model. It has been tested on the chunking and
18
+ named entity recognition tasks and set a new state-of-the-art on the former.
19
+
20
+ ## Training
21
+
22
+ Details of the training data and procedure can be found in the PhD thesis linked below. (With the caveat that it only contains preliminary results
23
+ based on the Wikipedia subcorpus. Evaluation of the full model will appear in a future paper.)
24
+
25
+ ## Eval results
26
+
27
+ When fine-tuned (via `BertForTokenClassification`) on chunking and NER, the model outperforms multilingual BERT, achieves state-of-the-art results on the
28
+ former task and comes within 0.5% F1 to the SotA on the latter. The exact scores are
29
+
30
+ | NER | Minimal NP | Maximal NP |
31
+ |-----|------------|------------|
32
+ | 97.62% | **97.14%** | **96.97%** |
33
+
34
+ ### BibTeX entry and citation info
35
+
36
+ The training corpus, parameters and the evaluation methods are discussed in the
37
+ [following PhD thesis](https://hlt.bme.hu/en/publ/nemeskey_2020):
38
+
39
+ ```bibtex
40
+ @PhDThesis{ Nemeskey:2020,
41
+ author = {Nemeskey, Dávid Márk},
42
+ title = {Natural Language Processing Methods for Language Modeling},
43
+ year = {2020},
44
+ school = {E\"otv\"os Lor\'and University}
45
+ }
46
+ ```