File size: 3,221 Bytes
aef0fda
 
da9ea0c
5cff37b
f1aac0e
5b3503c
771f88d
5b3503c
61f41b6
 
5cff37b
f1aac0e
634b1ff
 
74135d6
 
5cff37b
462f2b7
 
da9ea0c
462f2b7
bfa2c07
462f2b7
61f41b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bfa2c07
462f2b7
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: mit
---
# このモデルはLuke-japanese-base-liteをファインチューニングしたものです。
このモデルは文章がポジティブかネガティブかを分類することができます。
夏目漱石さんの文章(こころ、坊ちゃん、三四郎、etc)を日本語極性辞書
( http://www.cl.ecei.tohoku.ac.jp/Open_Resources-Japanese_Sentiment_Polarity_Dictionary.html )
を用いてポジティブ・ネガティブ判定したものを教師データとしてモデルの学習を行いました。
比較的長い文章(30語以上)において高い精度を発揮します。(単語など短い文章では低い正答率であることが確認されています。)
また使用した教師データから、口語より文語に対して高い正答率となることが期待されます。

# This model is based on Luke-japanese-base-lite
This model was fine-tuned model which besed on studio-ousia/Luke-japanese-base-lite.
This could be able to distinguish between positive and negative content.
This model was fine-tuned by using Natsume Souseki's documents.
For example Kokoro, Bocchan, Sanshiro and so on...

# what is Luke?[1] 
LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transformer. LUKE treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. LUKE adopts an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores.

LUKE achieves state-of-the-art results on five popular NLP benchmarks including SQuAD v1.1 (extractive question answering), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), TACRED (relation classification), and Open Entity (entity typing).

# how to use 使い方
-------------------------------------------------------------

import torch
from transformers import MLukeTokenizer
from torch import nn 

tokenizer = MLukeTokenizer.from_pretrained('studio-ousia/luke-japanese-base-lite')
model = torch.load('C:\\[My_luke_model_pn.pthのあるディレクトリ]\\My_luke_model_pn.pth')

text=input()

encoded_dict = tokenizer.encode_plus(
                        text,                     
                        return_attention_mask = True,   # Attention maksの作成
                        return_tensors = 'pt',     #  Pytorch tensorsで返す
                )

pre = model(encoded_dict['input_ids'], token_type_ids=None, attention_mask=encoded_dict['attention_mask'])
SOFTMAX=nn.Softmax(dim=0)
num=SOFTMAX(pre.logits[0])
if num[1]>0.5:
    print(str(num[1]))
    print('ポジティブ')
else:
    print(str(num[1]))
    print('ネガティブ')


-------------------------------------------------------------

# Citation
[1]@inproceedings{yamada2020luke,
  title={LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention},
  author={Ikuya Yamada and Akari Asai and Hiroyuki Shindo and Hideaki Takeda and Yuji Matsumoto},
  booktitle={EMNLP},
  year={2020}
}