File size: 1,536 Bytes
8f83403
e2208ce
8f83403
 
935f519
8f83403
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
{}
---

The L2AI-dictionary model is fine-tuned checkpoint of [klue/bert-base](https://huggingface.co/klue/bert-base) for multiple choice, specifically for selecting the best dictionary definition of a given word in a sentence. Below is an example usage:

```python
import numpy as np
import torch
from transformers import AutoModelForMultipleChoice, AutoTokenizer

model_name = "JesseStover/L2AI-dictionary-klue-bert-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMultipleChoice.from_pretrained(model_name)
model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))

prompts = "\"κ°•μ•„μ§€λŠ” λ½€μ†‘λ½€μ†‘ν•˜λ‹€.\"에 μžˆλŠ” \"강아지\"의 μ •μ˜λŠ” "
candidates = [
    "\"(λͺ…사) 개의 μƒˆλΌ\"μ˜ˆμš”.",
    "\"(λͺ…사) λΆ€λͺ¨λ‚˜ 할아버지, ν• λ¨Έλ‹ˆκ°€ μžμ‹μ΄λ‚˜ 손주λ₯Ό κ·€μ—¬μ›Œν•˜λ©΄μ„œ λΆ€λ₯΄λŠ” 말\"μ΄μ˜ˆμš”."
]

inputs = tokenizer(
    [[prompt, candidate] for candidate in candidates],
    return_tensors="pt",
    padding=True
)

labels = torch.tensor(0).unsqueeze(0)

with torch.no_grad():
    outputs = model(
        **{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels
    )

print({i: float(x) for i, x in enumerate(outputs.logits.softmax(1)[0])})
```

Training data was procured under Creative Commons [CC BY-SA 2.0 KR DEED](https://creativecommons.org/licenses/by-sa/2.0/kr/) from the National Institute of Korean Language's [Basic Korean Dictionary](https://krdict.korean.go.kr) and [Standard Korean Dictionary](https://stdict.korean.go.kr/).