File size: 2,972 Bytes
8895503
29167de
 
 
 
 
 
 
 
8895503
 
29167de
 
 
a1e1514
 
 
 
 
 
 
29167de
 
 
a1e1514
e01962d
29167de
 
 
 
3ff7a5a
29167de
 
 
 
 
 
a1e1514
 
29167de
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
language: ja
thumbnail: https://github.com/studio-ousia/luke/raw/master/resources/luke_logo.png
tags:
  - luke
  - named entity recognition
  - entity typing
  - relation classification
  - question answering
license: apache-2.0
---

## luke-japanese

**luke-japanese** is the Japanese version of **LUKE** (**L**anguage **U**nderstanding with **K**nowledge-based **E**mbeddings), a pre-trained _knowledge-enhanced_ contextualized representation of words and entities. LUKE treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Please refer to our [GitHub repository](https://github.com/studio-ousia/luke) for more details and updates.

This model contains Wikipedia entity embeddings which are not used in general NLP tasks. Please use the [lite version](https://huggingface.co/studio-ousia/luke-japanese-base-lite/) for tasks that do not use Wikipedia entities as inputs.

**luke-japanese**は、単語とエンティティの知識拡張型訓練済みTransformerモデル**LUKE**の日本語版です。LUKEは単語とエンティティを独立したトークンとして扱い、これらの文脈を考慮した表現を出力します。詳細については、[GitHub リポジトリ](https://github.com/studio-ousia/luke)を参照してください。

このモデルは、通常のNLPタスクでは使われないWikipediaエンティティのエンベディングを含んでいます。単語の入力のみを使うタスクには、[lite version](https://huggingface.co/studio-ousia/luke-japanese-base-lite/)を使用してください。

### Experimental results on JGLUE

The experimental results evaluated on the dev set of
[JGLUE](https://github.com/yahoojapan/JGLUE) are shown as follows:

| Model                  | MARC-ja   | JSTS                | JNLI      | JCommonsenseQA |
| ---------------------- | --------- | ------------------- | --------- | -------------- |
|                        | acc       | Pearson/Spearman    | acc       | acc            |
| **LUKE Japanese base** | **0.965** | **0.912**/**0.875** | **0.912** | **0.842**      |
| _Baselines:_           |           |
| Tohoku BERT base       | 0.958     | 0.899/0.859         | 0.899     | 0.808          |
| NICT BERT base         | 0.958     | 0.903/0.867         | 0.902     | 0.823          |
| Waseda RoBERTa base    | 0.962     | 0.901/0.865         | 0.895     | 0.840          |
| XLM RoBERTa base       | 0.961     | 0.870/0.825         | 0.893     | 0.687          |

The baseline scores are obtained from [here](https://github.com/yahoojapan/JGLUE/tree/9f650417195ec54a080411f44e2395012979d42e#baseline-scores).

### Citation

```latex
@inproceedings{yamada2020luke,
  title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention},
  author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto},
  booktitle={EMNLP},
  year={2020}
}
```