Edit model card

bert-tiny-imdb

This model is a fine-tuned version of prajjwal1/bert-tiny on the imdb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2775
  • Accuracy: 0.8944
  • Matthews Correlation: 0.7888

Model description

This is the smallest version of BERT model suggested by Google in this GitHub Repo, this model contains 2 transformer layers and an a hidden layer output length of 128, ie (L=2, H=128). There are a total 4.39 million paramteres in the model.

Intended uses & limitations

This model should be used for text classification tasks specifically on movie reviews or other such text data. Also you can use this model for other downstream tasks like:

  • Sentiment Analysis
  • Named Entity Recognition or Token Classification

This model should not be used for any tasks other than the above mentioned or any language other than English.

How to use the Model

Pytorch Model

from transformers import pipeline

# load pipeline
tiny_bert = pipeline("text-classification", "arnabdhar/tinybert-imdb")

# perform inference
results = pipeline(input_text, truncation=True, max_length=128)

ONNX Model

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSequenceClassification

# load tokenizer & model
model_name = "arnabdhar/tinybert-imdb"
tokenizer = AutoTokenizer.from_pretrained(model_name)
onnx_model = ORTModelForSequenceClassification.from_pretrained(model_name)

# build pipeline
tiny_bert_onnx = pipeline(
  task = "text-classification",
  tokenizer = tokenizer,
  model = onnx_model
)

# perform inference
results = tiny_bert_onnx(input_text, truncation=True, max_length=128)

Training

The model was finetuned on Google Colab using the NVIDIA V100 GPU and was trained for 9 epochs, it took around 12 minutes to finish finetuning.

This model has been trained on the imdb dataset which has 25,000 data text data for each training set and testing set, but I have combined both the partitions and then split the dataset in 80:20 ratio and used it for finetuning. This approach gave me a larger dataset to finetune the model.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 320
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 9

Training results

Training Loss Epoch Step Validation Loss Accuracy Matthews Correlation
0.4927 1.0 1250 0.3557 0.8484 0.7016
0.298 2.0 2500 0.2874 0.8866 0.7732
0.2555 3.0 3750 0.2799 0.8912 0.7828
0.2132 4.0 5000 0.2775 0.8944 0.7888
0.1779 5.0 6250 0.3065 0.891 0.7835
0.1508 6.0 7500 0.3331 0.889 0.7811
0.1304 7.0 8750 0.3451 0.8926 0.7870
0.119 8.0 10000 0.3670 0.8915 0.7852
0.1118 9.0 11250 0.3655 0.891 0.7840

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
3
Safetensors
Model size
4.39M params
Tensor type
F32
·

Finetuned from

Dataset used to train arnabdhar/tinybert-imdb

Evaluation results