File size: 3,089 Bytes
d0d14c7
 
 
9c0e723
 
f98c4d1
 
9c0e723
5ba717e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c0e723
c110c07
9c0e723
c110c07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c0e723
c110c07
9c0e723
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: apache-2.0
---
### DreamBank Custom Architecture

The repo contains the weights for the custom architecture presented in the paper [Automatic Annotation of Dream Report’s Emotional Content with Large Language Models](https://aclanthology.org/2024.clpsych-1.7/). 
A working example of how to load and use the model can be found below. Please refer to the [Git repo](https://github.com/lorenzoscottb/Dream_Reports_Annotation/tree/main/Experiments/Supervised_Learning) for more details.

#### Use 

```py
import torch, os
import pandas as pd
from tqdm import tqdm
import transformers
from transformers import AutoModel
from transformers import AutoConfig
from transformers import BertTokenizerFast
from SL_utils import *

Coding_emotions = {
    "AN": "Anger",
    "AP": "Apprehension",
    "SD": "Sadness",
    "CO": "Confusion",
    "HA": "Happiness",
}

emotions_list = list(Coding_emotions.keys())

test_sentences = [
    "In my dream I was follwed by the scary monster.",
    "I was walking in a forest, sorrounded by singing birds. I was in calm and peace."
]

test_sentences_target = len(test_sentences)*[[0, 0, 0, 0, 0]]
test_sentences_df     = pd.DataFrame.from_dict(
                {
                "report":test_sentences,
                "Report_as_Multilabel":test_sentences_target
                }
)
```

```py
model_name   = "bert-large-cased"
model_config = AutoConfig.from_pretrained(model_name)
tokenizer    = BertTokenizerFast.from_pretrained(model_name, do_lower_case=False)
testing_set  = CustomDataset(test_sentences_df, tokenizer, max_length=512)

test_params = {
    'batch_size': 2,
    'shuffle': True,
    'num_workers': 0
}

testing_loader  = DataLoader(testing_set, **test_params)

model = BERT_PTM(
    model_config,
    model_name=model_name, 
    n_classes=len(emotions_list), 
    freeze_BERT=False,
)

# Load the models' weights from the pre-treined model
model.load_state_dict(torch.load("path/to/pytorch_model.bin"))
model.to("cuda")
```

```py
outputs, targets, ids = validation(model, testing_loader, device="cuda", return_inputs=True)

corr_outputs    = np.array(outputs) >= 0.5 
corr_outputs_df = pd.DataFrame(corr_outputs, columns=emotions_list)
corr_outputs_df = corr_outputs_df.astype(int)

corr_outputs_df["report"] = decoded_ids = [decode_clean(x, tokenizer) for x in tqdm(ids)]
```

### Cite 
If you use this model on your work or research, please cite as:
```bibtex
@inproceedings{bertolini-etal-2024-automatic,
    title = "Automatic Annotation of Dream Report{'}s Emotional Content with Large Language Models",
    author = "Bertolini, Lorenzo  and
      Elce, Valentina  and
      Michalak, Adriana  and
      Widhoelzl, Hanna-Sophia  and
      Bernardi, Giulio  and
      Weeds, Julie",
    booktitle = "Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)",
    month = mar,
    year = "2024",
    address = "St. Julians, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.clpsych-1.7",
    pages = "92--107",
}

```