--- language: - vi thumbnail: "https://raw.githubusercontent.com/kldarek/polbert/master/img/polbert.png" tags: - transfomer - sbert - legaltext - vietnamese license: "mit" datasets: - vietnamese-legal-text --- # Vietnamese Legal Text BERT #### Table of contents 1. [Introduction](#introduction) 2. [Using Vietnamese Legal Text BERT](#transformers) - [Installation](#install2) - [Pre-trained models](#models2) - [Example usage](#usage2) # Using Vietnamese Legal Text BERT `hmthanh/VietnamLegalText-SBERT` ## Using Vietnamese Legal Text BERT `transformers` ### Installation - Install `transformers` with pip: ```pip install transformers```
- Install `tokenizers` with pip: ```pip install tokenizers``` ### Pre-trained models Model | #params | Arch. | Max length | Pre-training data ---|---|---|---|--- `hmthanh/VietnamLegalText-SBERT` | 135M | base | 256 | 20GB of texts ### Example usage ```python import torch from transformers import AutoModel, AutoTokenizer phobert = AutoModel.from_pretrained("hmthanh/VietnamLegalText-SBERT") tokenizer = AutoTokenizer.from_pretrained("hmthanh/VietnamLegalText-SBERT") sentence = 'Vượt đèn đỏ bị phạt bao nhiêu tiền?' input_ids = torch.tensor([tokenizer.encode(sentence)]) with torch.no_grad(): features = phobert(input_ids) # Models outputs are now tuples ```