File size: 4,256 Bytes
8c4970f
 
 
 
 
 
 
0a25b7d
 
 
65d6df8
 
 
8c4970f
 
 
 
0a25b7d
 
8c4970f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65d6df8
8c4970f
0a25b7d
65d6df8
 
8c4970f
 
0a25b7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09f4ba9
0a25b7d
09f4ba9
 
 
 
 
0a25b7d
 
8c4970f
 
0a25b7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c4970f
 
 
 
 
 
 
 
 
 
 
0a25b7d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
license: apache-2.0
tags:
- setfit
- sentence-transformers
- text-classification
pipeline_tag: text-classification
language:
- fr
library_name: setfit
metrics:
- f1
- accuracy
---

# binbin83/setfit-MiniLM-dialog-act-13nov

The model is a multi-class multi-label text classifier to distinguish the  different dialog act in semi-structured interview. The data used fot fine-tuning were in French.

This is a [SetFit model](https://github.com/huggingface/setfit) that can be used for text classification. The model has been trained using an efficient few-shot learning technique that involves:

1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
2. Training a classification head with features from the fine-tuned Sentence Transformer.

## Usage

To use this model for inference, first install the SetFit library:

```bash
python -m pip install setfit
```

You can then run inference as follows:

```python
from setfit import SetFitModel

# Download from Hub and run inference
model = SetFitModel.from_pretrained("binbin83/setfit-MiniLM-dialog-act-13nov")
label_dict = {'Introductory': 0, 'FollowUp': 1, 'Probing': 2, 'Specifying': 3, 'Structuring': 4, 'DirectQuestion': 5, 'Interpreting': 6, 'Ending': 7}
# Run inference
preds = model(["Vous pouvez continuer", "Pouvez-vous me dire précisément quel a été l'odre chronologique des événements ?"])
labels = [[[f for f, p in zip(labels_dict, ps) if p] for ps in [pred]] for pred in preds ]

```

##  Labels and training data
Brinkmann, S., & Kvale, S (1), define classification of dialog act in interview:
* Introductory: Can you tell me about ... (something specific)?,
* Follow-up verbal cues: repeat back keywords to participants, ask for reflection or unpacking of point just made,
* Probing: Can you say a little more about X? Why do you think X...? (for example, Why do you think X is that way? Why do you think X is important?),
* Specifying: Can you give me an example of X?,
* Indirect: How do you think other people view X?,
* Structuring: Thank you for that. I’d like to move to another topic...
* Direct (later stages): When you mention X, are you thinking like Y or Z?,
* Interpreting: So, what I have gathered is that...,
* Ending: I have asked all the questions I had, but I wanted to check whether there is something else about your experience/understanding we haven’t covered? Do you have any questions for me?,

On our corpus of interviews, we humanly label 500 turn of speech using this classification. We use 0.7 to train and evaluate on 0.3.

The entire corpus is composed of the following examples:  

('Probing', 146), ('Specifying', 135), ('FollowUp', 134), ('DirectQuestion', 125), ('Interpreting', 44), ('Structuring', 27), ('Introductory', 12), ('Ending', 12)


(1) Brinkmann, S., & Kvale, S. (2015). InterViews: Learning the Craft of Qualitative Research Interviewing. (3. ed.) SAGE Publications.


## Training and Performances

We finetune: "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
using SetFit with CosineLossSimilarity and this parapeters: epochs = 5, batch_size=32, num_iterations = 20

On our custom dataset, on test set, we get: 
{f1': 0.65,  'f1_micro': 0.64,  'f1_sample': 0.64,  'accuracy': 0.475}


## BibTeX entry and citation info


To cite the current study:
```bibtex
@article{
doi = {conference paper},
url = {https://arxiv.org/abs/2209.11055},
author = {Quillivic Robin, Charles Payet},
keywords = {NLP, JADT},
title = {Semi-Structured Interview Analysis: A French NLP Toolbox for Social Sciences},
publisher = {JADT},
year = {2024},
copyright = {Creative Commons Attribution 4.0 International}
}
```


To cite the setFit paper: 
```bibtex
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
```