Update README

5091563 10 months ago

No virus

4.89 kB

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: bert-pretrained-wikitext-2-raw-v1
	results: []
	license: apache-2.0
	datasets:
	- wikitext
	language:
	- en
	metrics:
	- accuracy
	library_name: transformers
	pipeline_tag: fill-mask
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# BERT

	This model is a pre-trained version of [BERT](https://huggingface.co/bert-base-uncased) on the [WikiText](https://huggingface.co/datasets/wikitext)
	language modeling dataset for educational purposes (see the [Training BERT from Scratch series on Medium(https://medium.com/p/b048682c795f)]).
	You cannot use it for any production purposes whatsoever.

	It achieves the following results on the evaluation set:
	- Loss: 7.9307
	- Masked Language Modeling (Masked LM) Accuracy: 0.1485
	- Next Sentence Prediction (NSP) Accuracy: 0.7891

	## Model description

	BERT, which stands for Bidirectional Encoder Representations from Transformers, is a revolutionary Natural Language Processing (NLP) model developed
	by Google in 2018. Its introduction marked a significant advancement in the field, setting new state-of-the-art benchmarks across various NLP tasks.
	For many, this is regarded as the ImageNet moment for the field.

	BERT is pre-trained on a massive amount of data, with one goal: to understand what language is and what’s the meaning of context in a document.
	As a result, this pre-trained model can be fine-tuned for specific tasks such as question-answering or sentiment analysis.

	## Intended uses & limitations

	This repository contains the model trained for 20 epochs on the WikiText dataset. Please note that the model is not suitable for production use
	and will not provide accurate predictions for Masked Language Modeling tasks.

	## Training and evaluation data

	The model was trained for 20 epochs on the [WikiText](https://huggingface.co/datasets/wikitext) language modeling dataset using the
	`wikitext-2-raw-v1` subset.

	## Training procedure

	We usually divide the training of BERT into two distinct phases. The first phase, known as "pre-training," aims to familiarize the model
	with language structure and the contextual significance of words. The second phase, termed "fine-tuning," focuses on adapting the model for specific, useful tasks.

	The model available in this repository has only undergone the pre-training phase.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 16
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 20

	### Training results

	The table below illustrates the model's training progress across the 20 epochs.

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Masked LM Accuracy \| NSP Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:------------------:\|:------------:\|
	\| 7.9726 \| 1.0 \| 564 \| 7.5680 \| 0.1142 \| 0.5 \|
	\| 7.5085 \| 2.0 \| 1128 \| 7.4155 \| 0.1329 \| 0.5557 \|
	\| 7.4112 \| 3.0 \| 1692 \| 7.3729 \| 0.1380 \| 0.5675 \|
	\| 7.3352 \| 4.0 \| 2256 \| 7.2816 \| 0.1398 \| 0.6060 \|
	\| 7.2823 \| 5.0 \| 2820 \| 7.1709 \| 0.1414 \| 0.6884 \|
	\| 7.1828 \| 6.0 \| 3384 \| 7.1503 \| 0.1417 \| 0.7109 \|
	\| 7.0796 \| 7.0 \| 3948 \| 7.0909 \| 0.1431 \| 0.7430 \|
	\| 6.8699 \| 8.0 \| 4512 \| 7.1666 \| 0.1422 \| 0.7238 \|
	\| 6.7819 \| 9.0 \| 5076 \| 7.2507 \| 0.1467 \| 0.7345 \|
	\| 6.7269 \| 10.0 \| 5640 \| 7.2654 \| 0.1447 \| 0.7484 \|
	\| 6.6701 \| 11.0 \| 6204 \| 7.3642 \| 0.1439 \| 0.7784 \|
	\| 6.613 \| 12.0 \| 6768 \| 7.5089 \| 0.1447 \| 0.7677 \|
	\| 6.5577 \| 13.0 \| 7332 \| 7.7611 \| 0.1469 \| 0.7655 \|
	\| 6.5197 \| 14.0 \| 7896 \| 7.5984 \| 0.1465 \| 0.7827 \|
	\| 6.4626 \| 15.0 \| 8460 \| 7.6738 \| 0.1449 \| 0.8030 \|
	\| 6.4026 \| 16.0 \| 9024 \| 7.7009 \| 0.1457 \| 0.7869 \|
	\| 6.3861 \| 17.0 \| 9588 \| 7.7586 \| 0.1503 \| 0.7955 \|
	\| 6.3779 \| 18.0 \| 10152 \| 7.7792 \| 0.1494 \| 0.8019 \|
	\| 6.357 \| 19.0 \| 10716 \| 7.8532 \| 0.1479 \| 0.7966 \|
	\| 6.3354 \| 20.0 \| 11280 \| 7.9307 \| 0.1485 \| 0.7891 \|


	### Framework versions

	- Transformers 4.33.1
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.5
	- Tokenizers 0.13.3