studio-ousia
/

luke-japanese-base

named entity recognition

relation classification

question answering

Inference Endpoints

Model card Files Files and versions Community

luke-japanese-base / README.md

ikuyamada's picture

add the results on JSTS v1.1

a1b05ff over 1 year ago

|

raw history blame contribute delete

No virus

2.97 kB

	---
	language: ja
	thumbnail: https://github.com/studio-ousia/luke/raw/master/resources/luke_logo.png
	tags:
	- luke
	- named entity recognition
	- entity typing
	- relation classification
	- question answering
	license: apache-2.0
	---

	## luke-japanese

	luke-japanese is the Japanese version of LUKE (Language Understanding with Knowledge-based Embeddings), a pre-trained _knowledge-enhanced_ contextualized representation of words and entities. LUKE treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Please refer to our [GitHub repository](https://github.com/studio-ousia/luke) for more details and updates.

	This model contains Wikipedia entity embeddings which are not used in general NLP tasks. Please use the [lite version](https://huggingface.co/studio-ousia/luke-japanese-base-lite/) for tasks that do not use Wikipedia entities as inputs.

	luke-japaneseは、単語とエンティティの知識拡張型訓練済みTransformerモデルLUKEの日本語版です。LUKEは単語とエンティティを独立したトークンとして扱い、これらの文脈を考慮した表現を出力します。詳細については、[GitHub リポジトリ](https://github.com/studio-ousia/luke)を参照してください。

	このモデルは、通常のNLPタスクでは使われないWikipediaエンティティのエンベディングを含んでいます。単語の入力のみを使うタスクには、[lite version](https://huggingface.co/studio-ousia/luke-japanese-base-lite/)を使用してください。

	### Experimental results on JGLUE

	The experimental results evaluated on the dev set of
	[JGLUE](https://github.com/yahoojapan/JGLUE) are shown as follows:

	\| Model \| MARC-ja \| JSTS \| JNLI \| JCommonsenseQA \|
	\| ---------------------- \| --------- \| ------------------- \| --------- \| -------------- \|
	\| \| acc \| Pearson/Spearman \| acc \| acc \|
	\| LUKE Japanese base \| 0.965 \| 0.916/0.877 \| 0.912 \| 0.842 \|
	\| _Baselines:_ \| \|
	\| Tohoku BERT base \| 0.958 \| 0.909/0.868 \| 0.899 \| 0.808 \|
	\| NICT BERT base \| 0.958 \| 0.910/0.871 \| 0.902 \| 0.823 \|
	\| Waseda RoBERTa base \| 0.962 \| 0.913/0.873 \| 0.895 \| 0.840 \|
	\| XLM RoBERTa base \| 0.961 \| 0.877/0.831 \| 0.893 \| 0.687 \|

	The baseline scores are obtained from
	[here](https://github.com/yahoojapan/JGLUE/blob/a6832af23895d6faec8ecf39ec925f1a91601d62/README.md).

	### Citation

	```latex
	@inproceedings{yamada2020luke,
	title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention},
	author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto},
	booktitle={EMNLP},
	year={2020}
	}
	```