Update README.md
Browse files
README.md
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## GigaBERT-v3
|
2 |
+
GigaBERT-v3 is a customized bilingual BERT for English and Arabic. It was pre-trained in a large-scale corpus (Gigaword+Oscar+Wikipedia) with ~10B tokens, showing state-of-the-art zero-shot transfer performance from English to Arabic on information extraction (IE) tasks. More details can be found in the following paper:
|
3 |
+
|
4 |
+
@inproceedings{lan2020gigabert,
|
5 |
+
author = {Lan, Wuwei and Chen, Yang and Xu, Wei and Ritter, Alan},
|
6 |
+
title = {GigaBERT: Zero-shot Transfer Learning from English to Arabic},
|
7 |
+
booktitle = {Proceedings of The 2020 Conference on Empirical Methods on Natural Language Processing (EMNLP)},
|
8 |
+
year = {2020}
|
9 |
+
}
|
10 |
+
|
11 |
+
## Usage
|
12 |
+
```
|
13 |
+
from transformers import *
|
14 |
+
tokenizer = BertTokenizer.from_pretrained("lanwuwei/GigaBERT-v3-Arabic-and-English", do_lower_case=True)
|
15 |
+
model = BertForTokenClassification.from_pretrained("lanwuwei/GigaBERT-v3-Arabic-and-English")
|
16 |
+
```
|
17 |
+
More coda examples can be found [here](https://github.com/lanwuwei/GigaBERT).
|
18 |
+
|