KoELECTRA v2 (Base Generator)

Pretrained ELECTRA Language Model for Korean (koelectra-base-v2-generator)

For more detail, please see original repository.

Usage

Load model and tokenizer

>>> from transformers import ElectraModel, ElectraTokenizer

>>> model = ElectraModel.from_pretrained("monologg/koelectra-base-v2-generator")
>>> tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-v2-generator")

Tokenizer example

>>> from transformers import ElectraTokenizer
>>> tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-v2-generator")
>>> tokenizer.tokenize("[CLS] 한국어 ELECTRA를 공유합니다. [SEP]")
['[CLS]', '한국어', 'EL', '##EC', '##TRA', '##를', '공유', '##합니다', '.', '[SEP]']
>>> tokenizer.convert_tokens_to_ids(['[CLS]', '한국어', 'EL', '##EC', '##TRA', '##를', '공유', '##합니다', '.', '[SEP]'])
[2, 5084, 16248, 3770, 19059, 29965, 2259, 10431, 5, 3]

Example using ElectraForMaskedLM

from transformers import pipeline

fill_mask = pipeline(
    "fill-mask",
    model="monologg/koelectra-base-v2-generator",
    tokenizer="monologg/koelectra-base-v2-generator"
)

print(fill_mask("나는 {} 밥을 먹었다.".format(fill_mask.tokenizer.mask_token)))
Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.