Edit model card

roberta-base-japanese-char-luw-upos

Model Description

This is a RoBERTa model pre-trained on 青空文庫 texts for POS-tagging and dependency-parsing, derived from roberta-base-japanese-aozora-char. Every long-unit-word is tagged by UPOS (Universal Part-Of-Speech) and FEATS.

How to Use

from transformers import AutoTokenizer,AutoModelForTokenClassification,TokenClassificationPipeline
tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-base-japanese-char-luw-upos")
model=AutoModelForTokenClassification.from_pretrained("KoichiYasuoka/roberta-base-japanese-char-luw-upos")
pipeline=TokenClassificationPipeline(tokenizer=tokenizer,model=model,aggregation_strategy="simple")
nlp=lambda x:[(x[t["start"]:t["end"]],t["entity_group"]) for t in pipeline(x)]
print(nlp("国境の長いトンネルを抜けると雪国であった。"))

or

import esupar
nlp=esupar.load("KoichiYasuoka/roberta-base-japanese-char-luw-upos")
print(nlp("国境の長いトンネルを抜けると雪国であった。"))

Reference

安岡孝一: Transformersと国語研長単位による日本語係り受け解析モデルの製作, 情報処理学会研究報告, Vol.2022-CH-128, No.7 (2022年2月), pp.1-8.

See Also

esupar: Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models

Downloads last month
23
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for KoichiYasuoka/roberta-base-japanese-char-luw-upos

Finetuned
(2)
this model

Dataset used to train KoichiYasuoka/roberta-base-japanese-char-luw-upos