File size: 7,472 Bytes

73e7c3e
 
 
fa6f719
 
73e7c3e
 
fa6f719
 
 
 
 
73e7c3e
 
fa6f719
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
989728f
fa6f719
 
73e7c3e
 
 
 
fa6f719
0e6d2e3
73e7c3e
 
 
 
 
 
 
 
 
 
 
 
 
fa6f719
 
 
 
 
 
 
 
 
 
73e7c3e
 
 
fa6f719
73e7c3e
 
 
fa6f719
73e7c3e

---
license: apache-2.0
tags:
- distigpt2
- hearthstone
metrics:
- bleu
- dvitel/codebleu
- exact_match
- chrf
datasets:
- dvitel/hearthstone
model-index:
- name: h1
  results:
  - task:
      type: text-generation
      name: Python Code Synthesis
    dataset:
      type: dvitel/hearthstone
      name: HearthStone
      split: test
    metrics:
      - type: exact_match
        value: 0.21212121212121213
        name: Exact Match
      - type: bleu
        value: 0.9637468196180485
        name: BLEU        
      - type: dvitel/codebleu
        value: 0.8884667222252154
        name: CodeBLEU                
      - type: chrf
        value: 96.5942286007928
        name: chrF     
---

# h1

This model is a fine-tuned version of [distilgpt2](https://huggingface.co/distilgpt2) on [hearthstone](https://huggingface.co/datasets/dvitel/hearthstone) dataset.
[GitHub repo](https://github.com/dvitel/nlp-sem-parsing/blob/master/h1.py).
It achieves the following results on the evaluation set:
- Loss: 0.0890
- Exact Match: 0.1970
- Bleu: 0.9737
- Codebleu: 0.9172
- Ngram Match Score: 0.8984
- Weighted Ngram Match Score: 0.8985
- Syntax Match Score: 0.9293
- Dataflow Match Score: 0.9429
- Chrf: 97.5313

## Model description

DistilGPT2 applied onto HearthStone dataset with preprocessing of python code to dumped AST. Example:
```python
#gold labels
Module([ClassDef('Innervate', [Name('SpellCard', Load())], [], [FunctionDef('__init__', arguments([], [arg('self', None, None)], None, [], [], None, []), [Expr(Call(Attribute(Call(Name('super', Load()), [], []), '__init__', Load()), [Constant('Innervate', None), Constant(0, None), Attribute(Name('CHARACTER_CLASS', Load()), 'DRUID', Load()), Attribute(Name('CARD_RARITY', Load()), 'FREE', Load())], []))], [], None, None), FunctionDef('use', arguments([], [arg('self', None, None), arg('player', None, None), arg('game', None, None)], None, [], [], None, []), [Expr(Call(Attribute(Call(Name('super', Load()), [], []), 'use', Load()), [Name('player', Load()), Name('game', Load())], [])), If(Compare(Attribute(Name('player', Load()),'mana', Load()), [Lt()], [Constant(8, None)]), [AugAssign(Attribute(Name('player', Load()),'mana', Store()), Add(), Constant(2, None))], [Assign([Attribute(Name('player', Load()),'mana', Store())], Constant(10, None), None)])], [], None, None)], [])], [])
```
```python
#wrong prediction (example of error after training)
Module([ClassDef('Innervate', [Name('SpellCard', Load())], [], [FunctionDef('__init__', arguments([], [arg('self', None, None)], None, [], [], None, []), [Expr(Call(Attribute(Call(Name('super', Load()), [], []), '__init__', Load()), [Constant('Innervate', None), Constant(0, None), Attribute(Name('CHARACTER_CLASS', Load()), 'DRUID', Load()), Attribute(Name('CARD_RARITY', Load()), 'FREE', Load())], []))], [], None, None), FunctionDef('use', arguments([], [arg('self', None, None), arg('player', None, None), arg('game', None, None)], None, [], [], None, []), [Expr(Call(Attribute(Call(Name('super', Load()), [], []), 'use', Load()), [Name('player', Load()), Name('game', Load())], [])), For(Compare(Attribute(Name('player', Load()),'maxa', Load()), [Lt()], [Constant(10, None)]), [AugAssign(Attribute(Name('player', Load()),'mana', Store()), Add(), Constant(2, None))], Exign([Name(Name('player', Load()),'mana', Store())], Constant(None, None), None)],], [], None, None)], [])], [])
```


## Intended uses & limitations

HearthStone card code synthesis. 

## Training and evaluation data

See split of [hearthstone](https://huggingface.co/datasets/dvitel/hearthstone) dataset

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 17
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 200
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Exact Match | Bleu   | Codebleu | Ngram Match Score | Weighted Ngram Match Score | Syntax Match Score | Dataflow Match Score | Chrf    |
|:-------------:|:------:|:-----:|:---------------:|:-----------:|:------:|:--------:|:-----------------:|:--------------------------:|:------------------:|:--------------------:|:-------:|
| 0.3871        | 11.94  | 1600  | 0.1043          | 0.0152      | 0.9499 | 0.8549   | 0.8089            | 0.8089                     | 0.8653             | 0.9366               | 95.4674 |
| 0.0752        | 23.88  | 3200  | 0.0784          | 0.1212      | 0.9640 | 0.8874   | 0.8525            | 0.8526                     | 0.8929             | 0.9516               | 96.7978 |
| 0.0448        | 35.82  | 4800  | 0.0717          | 0.1364      | 0.9693 | 0.9077   | 0.8782            | 0.8782                     | 0.9069             | 0.9674               | 97.2100 |
| 0.0308        | 47.76  | 6400  | 0.0752          | 0.1364      | 0.9702 | 0.9061   | 0.8808            | 0.8810                     | 0.9070             | 0.9554               | 97.1896 |
| 0.0223        | 59.7   | 8000  | 0.0762          | 0.1364      | 0.9724 | 0.9050   | 0.8877            | 0.8881                     | 0.9093             | 0.9348               | 97.4616 |
| 0.0166        | 71.64  | 9600  | 0.0762          | 0.1667      | 0.9733 | 0.9140   | 0.8948            | 0.8951                     | 0.9197             | 0.9461               | 97.4945 |
| 0.0128        | 83.58  | 11200 | 0.0793          | 0.1515      | 0.9728 | 0.9085   | 0.8911            | 0.8918                     | 0.9189             | 0.9321               | 97.4152 |
| 0.0104        | 95.52  | 12800 | 0.0822          | 0.1667      | 0.9732 | 0.9165   | 0.8946            | 0.8950                     | 0.9222             | 0.9541               | 97.4887 |
| 0.0084        | 107.46 | 14400 | 0.0832          | 0.1667      | 0.9737 | 0.9167   | 0.8970            | 0.8972                     | 0.9254             | 0.9471               | 97.5326 |
| 0.007         | 119.4  | 16000 | 0.0837          | 0.1818      | 0.9743 | 0.9160   | 0.8983            | 0.8986                     | 0.9238             | 0.9434               | 97.6638 |
| 0.0058        | 131.34 | 17600 | 0.0858          | 0.1818      | 0.9739 | 0.9200   | 0.8977            | 0.8977                     | 0.9267             | 0.9579               | 97.5583 |
| 0.005         | 143.28 | 19200 | 0.0878          | 0.1818      | 0.9743 | 0.9180   | 0.8993            | 0.9001                     | 0.9301             | 0.9426               | 97.5819 |
| 0.0044        | 155.22 | 20800 | 0.0877          | 0.1667      | 0.9736 | 0.9156   | 0.8957            | 0.8960                     | 0.9278             | 0.9429               | 97.5109 |
| 0.0042        | 167.16 | 22400 | 0.0890          | 0.1970      | 0.9736 | 0.9171   | 0.8984            | 0.8984                     | 0.9293             | 0.9424               | 97.5617 |
| 0.0038        | 179.1  | 24000 | 0.0891          | 0.2121      | 0.9738 | 0.9174   | 0.8991            | 0.8991                     | 0.9285             | 0.9429               | 97.5452 |
| 0.0037        | 191.04 | 25600 | 0.0890          | 0.1970      | 0.9737 | 0.9172   | 0.8984            | 0.8985                     | 0.9293             | 0.9429               | 97.5313 |


### Framework versions

- Transformers 4.24.0
- Pytorch 1.13.0
- Datasets 2.6.1
- Tokenizers 0.13.1