ChristopherA08
/

IndoELECTRA

Inference Endpoints

Model card Files Files and versions Community

IndoELECTRA / README.md

ChristopherA08's picture

Create README.md

ccebcb7 over 3 years ago

|

raw history blame contribute delete

No virus

1.06 kB

	---
	language: id
	datasets:
	- oscar
	---
	# IndoBERT (Indonesian BERT Model)

	## Model description
	ELECTRA is a new method for self-supervised language representation learning. This repository contains the pre-trained Electra Base model (tensorflow 1.15.0) trained in a Large Indonesian corpus (~16GB of raw text \| ~2B indonesian words).
	IndoELECTRA is a pre-trained language model based on ELECTRA architecture for the Indonesian Language.

	This model is base version which use electra-base config.

	## Intended uses & limitations

	#### How to use

	```python
	from transformers import AutoTokenizer, AutoModel
	tokenizer = AutoTokenizer.from_pretrained("ChristopherA08/IndoELECTRA")
	model = AutoModel.from_pretrained("ChristopherA08/IndoELECTRA")
	tokenizer.encode("hai aku mau makan.")
	[2, 8078, 1785, 2318, 1946, 18, 4]
	```

	## Training procedure

	The training of the model has been performed using Google's original Tensorflow code on eight core Google Cloud TPU v2.
	We used a Google Cloud Storage bucket, for persistent storage of training data and models.