Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Phobert Base model with Legal domain

Experiment performed with Transformers version 4.38.2
Vi-Legal-PhoBert model for Legal domain based on vinai/phobert-base-v2, then continued MLM pretraining for 154600 steps with token-level on Legal Corpus so the model can learn to legal domain.

Usage

Fill mask example:

from transformers import RobertaForMaskedLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("NghiemAbe/Vi-Legal-PhoBert")
model = RobertaForMaskedLM.from_pretrained("NghiemAbe/Vi-Legal-PhoBert")

Metric

I evaluated my Dev-Legal-Dataset and here are the results:

Model Paramaters Language Type Length R@1 R@5 R@10 R@20 R@100 MRR@5 MRR@10 MRR@20 MRR@100 Accuracy Masked
vinai/phobert-base-v2 125M vi 256 0.266 0.482 0.601 0.702 0.841 0.356 0.372 0.379 0.382 0.522
FacebookAI/xlm-roberta-base 279M mul 512 0.012 0.042 0.064 0.091 0.207 0.025 0.028 0.030 0.033 x
Geotrend/bert-base-vi-cased 179M vi 512 0.098 0.175 0.202 0.241 0.356 0.131 0.136 0.139 0.142 x
NlpHUST/roberta-base-vn x vi 512 0.050 0.097 0.126 0.163 0.369 0.071 0.076 0.078 0.083 x
aisingapore/sealion-bert-base x mul 512 0.002 0.007 0.021 0.036 0.106 0.003 0.005 0.006 0.008 x
Vi-Legal-PhoBert 125M vi 256 0.290 0.560 0.707 0.819 0.935 0.410 0.430 0.437 0.440 0.8401
Downloads last month
29
Safetensors
Model size
135M params
Tensor type
F32
·

Dataset used to train NghiemAbe/Vi-Legal-PhoBert