vocabtrimmer
/

mt5-small-trimmed-it-90000

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

mt5-small-trimmed-it-90000 / README.md

asahi417's picture

commit files to HF hub

fc622d5 over 1 year ago

|

1.58 kB

	# Vocabulary Trimmed [google/mt5-small](https://huggingface.co/google/mt5-small): `vocabtrimmer/mt5-small-trimmed-it-90000`
	This model is a trimmed version of [google/mt5-small](https://huggingface.co/google/mt5-small) by [`vocabtrimmer`](https://github.com/asahi417/lm-vocab-trimmer), a tool for trimming vocabulary of language models to compress the model size.
	Following table shows a summary of the trimming process.

	\| \| google/mt5-small \| vocabtrimmer/mt5-small-trimmed-it-90000 \|
	\|:---------------------------\|:-------------------\|:------------------------------------------\|
	\| parameter_size_full \| 300,176,768 \| 136,223,104 \|
	\| parameter_size_embedding \| 256,114,688 \| 92,161,024 \|
	\| vocab_size \| 250,112 \| 90,001 \|
	\| compression_rate_full \| 100.0 \| 45.38 \|
	\| compression_rate_embedding \| 100.0 \| 35.98 \|


	Following table shows the parameter used to trim vocabulary.

	\| language \| dataset \| dataset_column \| dataset_name \| dataset_split \| target_vocab_size \| min_frequency \|
	\|:-----------\|:----------------------------\|:-----------------\|:---------------\|:----------------\|--------------------:\|----------------:\|
	\| it \| vocabtrimmer/mc4_validation \| text \| it \| validation \| 90000 \| 2 \|