File size: 2,996 Bytes
3d6fc82
 
 
 
 
a35fb64
3d6fc82
 
 
 
 
 
 
 
 
 
 
 
 
 
ccb9e69
 
 
 
 
 
 
 
 
 
 
 
f220301
01e603f
ccb9e69
 
 
 
 
 
 
 
 
 
 
f220301
ccb9e69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d6fc82
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
language: ja
tags:
- t5
- text2text-generation
- pilota
license: apache-2.0
---

# Pilota model for scud2query

A model for [Pilota](https://github.com/megagonlabs/pilota) trained with <https://github.com/megagonlabs/scud2query>.

- ``scud``
    - Fine tuned model of [t5-base-japanese-web (with Byte-fallback, 8K)](https://huggingface.co/megagonlabs/t5-base-japanese-web-8k)
    - The original model is distributed in [the Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)
- ``scorer``
    - Fine tuned model of [LINE DistilBERT Japanese](https://huggingface.co/line-corporation/line-distilbert-base-japanese)
    - The original model is distributed in [the Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)

## Usage

1. Install [Pilota](https://github.com/megagonlabs/pilota)
2. Prepare inputs
    - Command

        ```bash
        echo -e 'ιƒ¨ε±‹γ«ε†·θ”΅εΊ«γŒγ‚γ‚‹γ¨θ‰―γ„γ€‚γƒ¬γƒ³γ‚Ώγ‚«γƒΌγ‚΅γƒΌγƒ“γ‚ΉγŒγ‚γ‚‹γƒ›γƒ†γƒ«γ‚’γ€customerγ€‘γŒεΈŒζœ›γ™γ‚‹γ€‚' | python -m pilota.convert.plain2request | tee input.jsonl
        ```

    - Output

        ```json
        {"context":null,"utterance":"ιƒ¨ε±‹γ«ε†·θ”΅εΊ«γŒγ‚γ‚‹γ¨θ‰―γ„γ€‚γƒ¬γƒ³γ‚Ώγ‚«γƒΌγ‚΅γƒΌγƒ“γ‚ΉγŒγ‚γ‚‹γƒ›γƒ†γƒ«γ‚’γ€customerγ€‘γŒεΈŒζœ›γ™γ‚‹γ€‚","sentences":null,"meta":{}}
        ```

3. Feed it to Pilota
    - Command

        ```console
        pilota -m megagonlabs/pilota_scud2query --batch_size 1 --outlen 60 --nbest 1 --beam 5 < input.jsonl
        ```

    - Output (Formatted by ``jq .``)

        ```json
        [
          {
            "scuds_nbest": [
              [
                "ιƒ¨ε±‹γ«ε†·θ”΅εΊ«γŒγ‚γ‚‹γ€‚"
              ]
            ],
            "original_ranks": [
              0
            ],
            "scores": [
              0.9769772589206696
            ],
            "scores_detail": [
              {
                "OK": 0.9232575297355652,
                "incorrect_none": 0.0034886503126472235,
                "lack": 0.023772092536091805,
                "limited": 0.013821585103869438,
                "untruth": 0.04332486167550087
              }
            ],
            "sentence": "ιƒ¨ε±‹γ«ε†·θ”΅εΊ«γŒγ‚γ‚‹γ¨θ‰―γ„γ€‚"
          },
          {
            "scuds_nbest": [
              [
                "γƒ¬γƒ³γ‚Ώγ‚«γƒΌγ‚΅γƒΌγƒ“γ‚ΉγŒγ‚γ‚‹γƒ›γƒ†γƒ«γ γ€‚"
              ]
            ],
            "original_ranks": [
              0
            ],
            "scores": [
              0.9876023113727569
            ],
            "scores_detail": [
              {
                "OK": 0.9586743712425232,
                "incorrect_none": 0.004059707745909691,
                "lack": 0.0024317132774740458,
                "limited": 0.007630097679793835,
                "untruth": 0.04025880992412567
              }
            ],
            "sentence": "γƒ¬γƒ³γ‚Ώγ‚«γƒΌγ‚΅γƒΌγƒ“γ‚ΉγŒγ‚γ‚‹γƒ›γƒ†γƒ«γ‚’γ€customerγ€‘γŒεΈŒζœ›γ™γ‚‹γ€‚"
          }
        ]
        ```

## License

Apache License 2.0