File size: 978 Bytes
c92336e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
## GigaBERT-v4
GigaBERT-v4 is a continued pre-training of [GigaBERT-v3](https://huggingface.co/lanwuwei/GigaBERT-v3-Arabic-and-English) on code-switched data, showing improved zero-shot transfer performance from English to Arabic on information extraction (IE) tasks. More details can be found in the following paper:

	@inproceedings{lan2020gigabert,
	  author     = {Lan, Wuwei and Chen, Yang and Xu, Wei and Ritter, Alan},
  	  title      = {GigaBERT: Zero-shot Transfer Learning from English to Arabic},
  	  booktitle  = {Proceedings of The 2020 Conference on Empirical Methods on Natural Language Processing (EMNLP)},
  	  year       = {2020}
  	} 

## Usage
```
from transformers import *
tokenizer = BertTokenizer.from_pretrained("lanwuwei/GigaBERT-v4-Arabic-and-English", do_lower_case=True)
model = BertForTokenClassification.from_pretrained("lanwuwei/GigaBERT-v4-Arabic-and-English")
```
More coda examples can be found [here](https://github.com/lanwuwei/GigaBERT).