File size: 6,166 Bytes

---
library_name: transformers
tags: []
pipeline_tag: fill-mask
widget:
 - text: "shop làm ăn như cái <mask>"
 - text: "hag từ Quảng <mask> kực nét"
 - text: "Set xinh quá, <mask> bèo nhèo"
 - text: "đúng nhận sai <mask>"
---

# 5CD-AI/viso-twhin-bert-large
## Overview
<!-- Provide a quick summary of what the model is/does. -->
We reduce TwHIN-BERT's vocabulary size to 20k on the UIT dataset and continue pretraining for 10 epochs.

Here are the results on 4 downstream tasks on Vietnamese social media texts, including Emotion Recognition(UIT-VSMEC), Hate Speech Detection(UIT-HSD), Spam Reviews Detection(ViSpamReviews), Hate Speech Spans Detection(ViHOS):
<table>
        <tr align="center">
            <td rowspan=2><b>Model</td>
            <td rowspan=2><b>Avg</td>
            <td colspan=3><b>Emotion Recognition</td>
            <td colspan=3><b>Hate Speech Detection</td>
            <td colspan=3><b>Spam Reviews Detection</td>
            <td colspan=3><b>Hate Speech Spans Detection</td>
        </tr>
        <tr align="center">
            <td><b>Acc</td>
            <td><b>WF1</td>
            <td><b>MF1</td>
            <td><b>Acc</td>
            <td><b>WF1</td>
            <td><b>MF1</td>
            <td><b>Acc</td>
            <td><b>WF1</td>
            <td><b>MF1</td>
            <td><b>Acc</td>
            <td><b>WF1</td>
            <td><b>MF1</td>
        </tr>
        <tr align="center">
            <td align="left">viBERT</td>
            <td>78.16</td>
            <td>61.91</td>
            <td>61.98</td>
            <td>59.7</td>
            <td>85.34</td>
            <td>85.01</td>
            <td>62.07</td>
            <td>89.93</td>
            <td>89.79</td>
            <td>76.8</td>
            <td>90.42</td>
            <td>90.45</td>
            <td>84.55</td>
        </tr>
        <tr align="center">
            <td align="left">vELECTRA</td>
            <td>79.23</td>
            <td>64.79</td>
            <td>64.71</td>
            <td>61.95</td>
            <td>86.96</td>
            <td>86.37</td>
            <td>63.95</td>
            <td>89.83</td>
            <td>89.68</td>
            <td>76.23</td>
            <td>90.59</td>
            <td>90.58</td>
            <td>85.12</td>
        </tr>
        <tr align="center">
            <td align="left">PhoBERT-Base </td>
            <td>79.3</td>
            <td>63.49</td>
            <td>63.36</td>
            <td>61.41</td>
            <td>87.12</td>
            <td>86.81</td>
            <td>65.01</td>
            <td>89.83</td>
            <td>89.75</td>
            <td>76.18</td>
            <td>91.32</td>
            <td>91.38</td>
            <td>85.92</td>
        </tr>
        <tr align="center">
            <td align="left">PhoBERT-Large</td>
            <td>79.82</td>
            <td>64.71</td>
            <td>64.66</td>
            <td>62.55</td>
            <td>87.32</td>
            <td>86.98</td>
            <td>65.14</td>
            <td>90.12</td>
            <td>90.03</td>
            <td>76.88</td>
            <td>91.44</td>
            <td>91.46</td>
            <td>86.56</td>
        </tr>
        <tr align="center">
            <td align="left">ViSoBERT</td>
            <td>81.58</td>
            <td>68.1</td>
            <td>68.37</td>
            <td>65.88</td>
            <td>88.51</td>
            <td>88.31</td>
            <td>68.77</td>
            <td>90.99</td>
            <td>90.92</td>
            <td>79.06</td>
            <td>91.62</td>
            <td>91.57</td>
            <td>86.8</td>
        </tr>
        <tr align="center">
            <td align="left">visobert-14gb-corpus</td>
            <td>82.2</td>
            <td>68.69</td>
            <td>68.75</td>
            <td>66.03</td>
            <td>88.79</td>
            <td>88.6</td>
            <td>69.57</td>
            <td>91.02</td>
            <td>90.88</td>
            <td>77.13</td>
            <td>93.69</td>
            <td>93.63</td>
            <td>89.66</td>
        </tr>
        <tr align="center">
            <td align="left">viso-twhin-bert-large</td>
            <td><b>83.87</td>
            <td><b>73.45</td>
            <td><b>73.14</td>
            <td><b>70.99</td>
            <td><b>88.86</td>
            <td><b>88.8</td>
            <td><b>70.81</td>
            <td><b>91.6</td>
            <td><b>91.47</td>
            <td><b>79.07</td>
            <td><b>94.08</td>
            <td><b>93.96</td>
            <td><b>90.22</td>
        </tr>
    </div>
</table>

## Usage (HuggingFace Transformers)

Install `transformers` package:
    
    pip install transformers

Then you can use this model for fill-mask task like this:

```python
from transformers import pipeline

model_path = "5CD-AI/viso-twhin-bert-large"
mask_filler = pipeline("fill-mask", model_path)

mask_filler("đúng nhận sai <mask>", top_k=10)
```

## Fine-tune Configuration
We fine-tune `5CD-AI/viso-twhin-bert-large` on 4 downstream tasks with `transformer` library with the following configuration:
- train_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 4
- weight_decay: 0.01
- optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- training_epochs: 30
- model_max_length: 128
- metric_for_best_model: wf1
- strategy: epoch
  
And different additional configurations for each task:
| Emotion Recognition                                                               | Hate Speech Detection                                                             | Spam Reviews Detection                                                            | Hate Speech Spans Detection                                                       |
| --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
|\- learning_rate: 1e-5| \- learning_rate: 5e-6 | \- learning_rate: 1e-5 | \- learning_rate: 5e-6 |