Edit model card

deberta-base-japanese-wikipedia-luw-upos

Model Description

This is a DeBERTa(V2) model pre-trained on Japanese Wikipedia and 青空文庫 texts for POS-tagging and dependency-parsing, derived from deberta-base-japanese-wikipedia. Every long-unit-word is tagged by UPOS (Universal Part-Of-Speech) and FEATS.

How to Use

import torch
from transformers import AutoTokenizer,AutoModelForTokenClassification
tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/deberta-base-japanese-wikipedia-luw-upos")
model=AutoModelForTokenClassification.from_pretrained("KoichiYasuoka/deberta-base-japanese-wikipedia-luw-upos")
s="国境の長いトンネルを抜けると雪国であった。"
t=tokenizer.tokenize(s)
p=[model.config.id2label[q] for q in torch.argmax(model(tokenizer.encode(s,return_tensors="pt"))["logits"],dim=2)[0].tolist()[1:-1]]
print(list(zip(t,p)))

or

import esupar
nlp=esupar.load("KoichiYasuoka/deberta-base-japanese-wikipedia-luw-upos")
print(nlp("国境の長いトンネルを抜けると雪国であった。"))

Reference

安岡孝一: 青空文庫DeBERTaモデルによる国語研長単位係り受け解析, 東洋学へのコンピュータ利用, 第35回研究セミナー (2022年7月), pp.29-43.

See Also

esupar: Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models

Downloads last month
8
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for KoichiYasuoka/deberta-base-japanese-wikipedia-luw-upos

Finetuned
this model
Finetunes
1 model

Dataset used to train KoichiYasuoka/deberta-base-japanese-wikipedia-luw-upos