lmqg
/

mt5-base-esquad-ae-trimmed-50000

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

mt5-base-esquad-ae-trimmed-50000 / README.md

asahi417's picture

commit files to HF hub

20464d7 about 1 year ago

|

1.64 kB

	# Vocabulary Trimmed [lmqg/mt5-base-esquad-ae](https://huggingface.co/lmqg/mt5-base-esquad-ae): `lmqg/mt5-base-esquad-ae-trimmed-50000`
	This model is a trimmed version of [lmqg/mt5-base-esquad-ae](https://huggingface.co/lmqg/mt5-base-esquad-ae) by [`vocabtrimmer`](https://github.com/asahi417/lm-vocab-trimmer), a tool for trimming vocabulary of language models to compress the model size.
	Following table shows a summary of the trimming process.

	\| \| lmqg/mt5-base-esquad-ae \| lmqg/mt5-base-esquad-ae-trimmed-50000 \|
	\|:---------------------------\|:--------------------------\|:----------------------------------------\|
	\| parameter_size_full \| 582,384,384 \| 275,032,320 \|
	\| parameter_size_embedding \| 384,155,136 \| 76,803,072 \|
	\| vocab_size \| 250,101 \| 50,002 \|
	\| compression_rate_full \| 100.0 \| 47.23 \|
	\| compression_rate_embedding \| 100.0 \| 19.99 \|


	Following table shows the parameter used to trim vocabulary.

	\| language \| dataset \| dataset_column \| dataset_name \| dataset_split \| target_vocab_size \| min_frequency \|
	\|:-----------\|:----------------------------\|:-----------------\|:---------------\|:----------------\|--------------------:\|----------------:\|
	\| es \| vocabtrimmer/mc4_validation \| text \| es \| validation \| 50000 \| 2 \|