Spaces:

OFA-Sys
/

OFA-Visual_Grounding

Runtime error

App Files Files Community

OFA-Visual_Grounding / fairseq /examples /gottbert /README.md

JustinLin610

update

10b0761 over 2 years ago

preview code

raw history blame contribute delete

No virus

2.09 kB

	# GottBERT: a pure German language model

	## Introduction

	[GottBERT](http://arxiv.org/abs/2012.02110) is a pretrained language model trained on 145GB of German text based on RoBERTa.

	## Example usage

	### fairseq
	##### Load GottBERT from torch.hub (PyTorch >= 1.1):
	```python
	import torch
	gottbert = torch.hub.load('pytorch/fairseq', 'gottbert-base')
	gottbert.eval() # disable dropout (or leave in train mode to finetune)
	```

	##### Load GottBERT (for PyTorch 1.0 or custom models):
	```python
	# Download gottbert model
	wget https://dl.gottbert.de/fairseq/models/gottbert-base.tar.gz
	tar -xzvf gottbert.tar.gz

	# Load the model in fairseq
	from fairseq.models.roberta import GottbertModel
	gottbert = GottbertModel.from_pretrained('/path/to/gottbert')
	gottbert.eval() # disable dropout (or leave in train mode to finetune)
	```

	##### Filling masks:
	```python
	masked_line = 'Gott ist <mask> ! :)'
	gottbert.fill_mask(masked_line, topk=3)
	# [('Gott ist gut ! :)', 0.3642110526561737, ' gut'),
	# ('Gott ist überall ! :)', 0.06009674072265625, ' überall'),
	# ('Gott ist großartig ! :)', 0.0370681993663311, ' großartig')]
	```

	##### Extract features from GottBERT

	```python
	# Extract the last layer's features
	line = "Der erste Schluck aus dem Becher der Naturwissenschaft macht atheistisch , aber auf dem Grunde des Bechers wartet Gott !"
	tokens = gottbert.encode(line)
	last_layer_features = gottbert.extract_features(tokens)
	assert last_layer_features.size() == torch.Size([1, 27, 768])

	# Extract all layer's features (layer 0 is the embedding layer)
	all_layers = gottbert.extract_features(tokens, return_all_hiddens=True)
	assert len(all_layers) == 13
	assert torch.all(all_layers[-1] == last_layer_features)
	```
	## Citation
	If you use our work, please cite:

	```bibtex
	@misc{scheible2020gottbert,
	title={GottBERT: a pure German Language Model},
	author={Raphael Scheible and Fabian Thomczyk and Patric Tippmann and Victor Jaravine and Martin Boeker},
	year={2020},
	eprint={2012.02110},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```