Spaces:

sunit333
/

demo-summary-hindi

No application file

App Files Files Community

demo-summary-hindi / Indic-BERT-v1-master /albert /README.md

sunit333

Upload 63 files

d08dd00 verified 9 months ago

preview code

raw

history blame contribute delete

12.9 kB

	ALBERT
	======

	************* Changes from Original Implementation *************

	1. Remove sentence order in `run_pretraining.py`
	2. Modify `_is_start_piece_sp` function in `create_pretraining_data.py` to account for non-English languages.

	*************New March 28, 2020 *************

	Add a colab [tutorial](https://github.com/google-research/albert/blob/master/albert_glue_fine_tuning_tutorial.ipynb) to run fine-tuning for GLUE datasets.

	*************New January 7, 2020 *************

	v2 TF-Hub models should be working now with TF 1.15, as we removed the
	native Einsum op from the graph. See updated TF-Hub links below.

	*************New December 30, 2019 *************

	Chinese models are released. We would like to thank [CLUE team ](https://github.com/CLUEbenchmark/CLUE) for providing the training data.

	- [Base](https://storage.googleapis.com/albert_models/albert_base_zh.tar.gz)
	- [Large](https://storage.googleapis.com/albert_models/albert_large_zh.tar.gz)
	- [Xlarge](https://storage.googleapis.com/albert_models/albert_xlarge_zh.tar.gz)
	- [Xxlarge](https://storage.googleapis.com/albert_models/albert_xxlarge_zh.tar.gz)

	Version 2 of ALBERT models is released.

	- Base: [[Tar file](https://storage.googleapis.com/albert_models/albert_base_v2.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_base/3)]
	- Large: [[Tar file](https://storage.googleapis.com/albert_models/albert_large_v2.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_large/3)]
	- Xlarge: [[Tar file](https://storage.googleapis.com/albert_models/albert_xlarge_v2.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_xlarge/3)]
	- Xxlarge: [[Tar file](https://storage.googleapis.com/albert_models/albert_xxlarge_v2.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_xxlarge/3)]

	In this version, we apply 'no dropout', 'additional training data' and 'long training time' strategies to all models. We train ALBERT-base for 10M steps and other models for 3M steps.

	The result comparison to the v1 models is as followings:

	\| \| Average \| SQuAD1.1 \| SQuAD2.0 \| MNLI \| SST-2 \| RACE \|
	\|----------------\|----------\|----------\|----------\|----------\|----------\|----------\|
	\|V2 \|
	\|ALBERT-base \|82.3 \|90.2/83.2 \|82.1/79.3 \|84.6 \|92.9 \|66.8 \|
	\|ALBERT-large \|85.7 \|91.8/85.2 \|84.9/81.8 \|86.5 \|94.9 \|75.2 \|
	\|ALBERT-xlarge \|87.9 \|92.9/86.4 \|87.9/84.1 \|87.9 \|95.4 \|80.7 \|
	\|ALBERT-xxlarge \|90.9 \|94.6/89.1 \|89.8/86.9 \|90.6 \|96.8 \|86.8 \|
	\|V1 \|
	\|ALBERT-base \|80.1 \|89.3/82.3 \| 80.0/77.1\|81.6 \|90.3 \| 64.0 \|
	\|ALBERT-large \|82.4 \|90.6/83.9 \| 82.3/79.4\|83.5 \|91.7 \| 68.5 \|
	\|ALBERT-xlarge \|85.5 \|92.5/86.1 \| 86.1/83.1\|86.4 \|92.4 \| 74.8 \|
	\|ALBERT-xxlarge \|91.0 \|94.8/89.3 \| 90.2/87.4\|90.8 \|96.9 \| 86.5 \|

	The comparison shows that for ALBERT-base, ALBERT-large, and ALBERT-xlarge, v2 is much better than v1, indicating the importance of applying the above three strategies. On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet. For v2, we simply adopt the parameters from v1 except for RACE, where we use a learning rate of 1e-5 and 0 [ALBERT DR](https://arxiv.org/pdf/1909.11942.pdf) (dropout rate for ALBERT in finetuning). The original (v1) RACE hyperparameter will cause model divergence for v2 models. Given that the downstream tasks are sensitive to the fine-tuning hyperparameters, we should be careful about so called slight improvements.

	ALBERT is "A Lite" version of BERT, a popular unsupervised language
	representation learning algorithm. ALBERT uses parameter-reduction techniques
	that allow for large-scale configurations, overcome previous memory limitations,
	and achieve better behavior with respect to model degradation.

	For a technical description of the algorithm, see our paper:

	[ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942)

	Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

	Release Notes
	=============

	- Initial release: 10/9/2019

	Results
	=======

	Performance of ALBERT on GLUE benchmark results using a single-model setup on
	dev:

	\| Models \| MNLI \| QNLI \| QQP \| RTE \| SST \| MRPC \| CoLA \| STS \|
	\|-------------------\|----------\|----------\|----------\|----------\|----------\|----------\|----------\|----------\|
	\| BERT-large \| 86.6 \| 92.3 \| 91.3 \| 70.4 \| 93.2 \| 88.0 \| 60.6 \| 90.0 \|
	\| XLNet-large \| 89.8 \| 93.9 \| 91.8 \| 83.8 \| 95.6 \| 89.2 \| 63.6 \| 91.8 \|
	\| RoBERTa-large \| 90.2 \| 94.7 \| 92.2 \| 86.6 \| 96.4 \| 90.9 \| 68.0 \| 92.4 \|
	\| ALBERT (1M) \| 90.4 \| 95.2 \| 92.0 \| 88.1 \| 96.8 \| 90.2 \| 68.7 \| 92.7 \|
	\| ALBERT (1.5M) \| 90.8 \| 95.3 \| 92.2 \| 89.2 \| 96.9 \| 90.9 \| 71.4 \| 93.0 \|

	Performance of ALBERT-xxl on SQuaD and RACE benchmarks using a single-model
	setup:

	\|Models \| SQuAD1.1 dev \| SQuAD2.0 dev \| SQuAD2.0 test \| RACE test (Middle/High) \|
	\|--------------------------\|---------------\|---------------\|---------------\|-------------------------\|
	\|BERT-large \| 90.9/84.1 \| 81.8/79.0 \| 89.1/86.3 \| 72.0 (76.6/70.1) \|
	\|XLNet \| 94.5/89.0 \| 88.8/86.1 \| 89.1/86.3 \| 81.8 (85.5/80.2) \|
	\|RoBERTa \| 94.6/88.9 \| 89.4/86.5 \| 89.8/86.8 \| 83.2 (86.5/81.3) \|
	\|UPM \| - \| - \| 89.9/87.2 \| - \|
	\|XLNet + SG-Net Verifier++ \| - \| - \| 90.1/87.2 \| - \|
	\|ALBERT (1M) \| 94.8/89.2 \| 89.9/87.2 \| - \| 86.0 (88.2/85.1) \|
	\|ALBERT (1.5M) \| 94.8/89.3 \| 90.2/87.4 \| 90.9/88.1 \| 86.5 (89.0/85.5) \|


	Pre-trained Models
	==================
	TF-Hub modules are available:

	- Base: [[Tar file](https://storage.googleapis.com/albert_models/albert_base_v1.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_base/1)]
	- Large: [[Tar file](https://storage.googleapis.com/albert_models/albert_large_v1.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_large/1)]
	- Xlarge: [[Tar file](https://storage.googleapis.com/albert_models/albert_xlarge_v1.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_xlarge/1)]
	- Xxlarge: [[Tar file](https://storage.googleapis.com/albert_models/albert_xxlarge_v1.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_xxlarge/1)]

	Example usage of the TF-Hub module in code:

	```
	tags = set()
	if is_training:
	tags.add("train")
	albert_module = hub.Module("https://tfhub.dev/google/albert_base/1", tags=tags,
	trainable=True)
	albert_inputs = dict(
	input_ids=input_ids,
	input_mask=input_mask,
	segment_ids=segment_ids)
	albert_outputs = albert_module(
	inputs=albert_inputs,
	signature="tokens",
	as_dict=True)

	# If you want to use the token-level output, use
	# albert_outputs["sequence_output"] instead.
	output_layer = albert_outputs["pooled_output"]
	```

	Most of the fine-tuning scripts in this repository support TF-hub modules
	via the `--albert_hub_module_handle` flag.

	Pre-training Instructions
	=========================
	To pretrain ALBERT, use `run_pretraining.py`:

	```
	pip install -r albert/requirements.txt
	python -m albert.run_pretraining \
	--input_file=... \
	--output_dir=... \
	--init_checkpoint=... \
	--albert_config_file=... \
	--do_train \
	--do_eval \
	--train_batch_size=4096 \
	--eval_batch_size=64 \
	--max_seq_length=512 \
	--max_predictions_per_seq=20 \
	--optimizer='lamb' \
	--learning_rate=.00176 \
	--num_train_steps=125000 \
	--num_warmup_steps=3125 \
	--save_checkpoints_steps=5000
	```

	Fine-tuning on GLUE
	===================
	To fine-tune and evaluate a pretrained ALBERT on GLUE, please see the
	convenience script `run_glue.sh`.

	Lower-level use cases may want to use the `run_classifier.py` script directly.
	The `run_classifier.py` script is used both for fine-tuning and evaluation of
	ALBERT on individual GLUE benchmark tasks, such as MNLI:

	```
	pip install -r albert/requirements.txt
	python -m albert.run_classifier \
	--data_dir=... \
	--output_dir=... \
	--init_checkpoint=... \
	--albert_config_file=... \
	--spm_model_file=... \
	--do_train \
	--do_eval \
	--do_predict \
	--do_lower_case \
	--max_seq_length=128 \
	--optimizer=adamw \
	--task_name=MNLI \
	--warmup_step=1000 \
	--learning_rate=3e-5 \
	--train_step=10000 \
	--save_checkpoints_steps=100 \
	--train_batch_size=128
	```

	Good default flag values for each GLUE task can be found in `run_glue.sh`.

	You can fine-tune the model starting from TF-Hub modules instead of raw
	checkpoints by setting e.g.
	`--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1` instead
	of `--init_checkpoint`.

	You can find the spm_model_file in the tar files or under the assets folder of
	the tf-hub module. The name of the model file is "30k-clean.model".

	After evaluation, the script should report some output like this:

	```
	*** Eval results ***
	global_step = ...
	loss = ...
	masked_lm_accuracy = ...
	masked_lm_loss = ...
	sentence_order_accuracy = ...
	sentence_order_loss = ...
	```

	Fine-tuning on SQuAD
	====================
	To fine-tune and evaluate a pretrained model on SQuAD v1, use the
	`run_squad_v1.py` script:

	```
	pip install -r albert/requirements.txt
	python -m albert.run_squad_v1 \
	--albert_config_file=... \
	--output_dir=... \
	--train_file=... \
	--predict_file=... \
	--train_feature_file=... \
	--predict_feature_file=... \
	--predict_feature_left_file=... \
	--init_checkpoint=... \
	--spm_model_file=... \
	--do_lower_case \
	--max_seq_length=384 \
	--doc_stride=128 \
	--max_query_length=64 \
	--do_train=true \
	--do_predict=true \
	--train_batch_size=48 \
	--predict_batch_size=8 \
	--learning_rate=5e-5 \
	--num_train_epochs=2.0 \
	--warmup_proportion=.1 \
	--save_checkpoints_steps=5000 \
	--n_best_size=20 \
	--max_answer_length=30
	```

	You can fine-tune the model starting from TF-Hub modules instead of raw
	checkpoints by setting e.g.
	`--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1` instead
	of `--init_checkpoint`.

	For SQuAD v2, use the `run_squad_v2.py` script:

	```
	pip install -r albert/requirements.txt
	python -m albert.run_squad_v2 \
	--albert_config_file=... \
	--output_dir=... \
	--train_file=... \
	--predict_file=... \
	--train_feature_file=... \
	--predict_feature_file=... \
	--predict_feature_left_file=... \
	--init_checkpoint=... \
	--spm_model_file=... \
	--do_lower_case \
	--max_seq_length=384 \
	--doc_stride=128 \
	--max_query_length=64 \
	--do_train \
	--do_predict \
	--train_batch_size=48 \
	--predict_batch_size=8 \
	--learning_rate=5e-5 \
	--num_train_epochs=2.0 \
	--warmup_proportion=.1 \
	--save_checkpoints_steps=5000 \
	--n_best_size=20 \
	--max_answer_length=30
	```

	You can fine-tune the model starting from TF-Hub modules instead of raw
	checkpoints by setting e.g.
	`--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1` instead
	of `--init_checkpoint`.

	Fine-tuning on RACE
	===================
	For RACE, use the `run_race.py` script:

	```
	pip install -r albert/requirements.txt
	python -m albert.run_race \
	--albert_config_file=... \
	--output_dir=... \
	--train_file=... \
	--eval_file=... \
	--data_dir=...\
	--init_checkpoint=... \
	--spm_model_file=... \
	--max_seq_length=512 \
	--max_qa_length=128 \
	--do_train \
	--do_eval \
	--train_batch_size=32 \
	--eval_batch_size=8 \
	--learning_rate=1e-5 \
	--train_step=12000 \
	--warmup_step=1000 \
	--save_checkpoints_steps=100
	```

	You can fine-tune the model starting from TF-Hub modules instead of raw
	checkpoints by setting e.g.
	`--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1` instead
	of `--init_checkpoint`.

	SentencePiece
	=============
	Command for generating the sentence piece vocabulary:

	```
	spm_train \
	--input all.txt --model_prefix=30k-clean --vocab_size=30000 --logtostderr
	--pad_id=0 --unk_id=1 --eos_id=-1 --bos_id=-1
	--control_symbols=[CLS],[SEP],[MASK]
	--user_defined_symbols="(,),\",-,.,–,£,€"
	--shuffle_input_sentence=true --input_sentence_size=10000000
	--character_coverage=0.99995 --model_type=unigram
	```