vocabtrimmer
/

mbart-large-cc25-trimmed-ja

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

asahi417 commited on Apr 2, 2023

Commit

6f288af

•

1 Parent(s): 78cd7d5

commit files to HF hub

Files changed (1) hide show

README.md +18 -0

README.md ADDED Viewed

	@@ -0,0 +1,18 @@

+# Vocabulary Trimmed [facebook/mbart-large-cc25](https://huggingface.co/facebook/mbart-large-cc25): `vocabtrimmer/mbart-large-cc25-trimmed-ja`
+This model is a trimmed version of [facebook/mbart-large-cc25](https://huggingface.co/facebook/mbart-large-cc25) by [`vocabtrimmer`](https://github.com/asahi417/lm-vocab-trimmer), a tool for trimming vocabulary of language models to compress the model size.
+Following table shows a summary of the trimming process.
+|                            | facebook/mbart-large-cc25   | vocabtrimmer/mbart-large-cc25-trimmed-ja   |
+|:---------------------------|:----------------------------|:-------------------------------------------|
+| parameter_size_full        | 610,851,840                 | 434,447,360                                |
+| parameter_size_embedding   | 512,055,296                 | 159,246,336                                |
+| vocab_size                 | 250,027                     | 77,757                                     |
+| compression_rate_full      | 100.0                       | 71.12                                      |
+| compression_rate_embedding | 100.0                       | 31.1                                       |
+Following table shows the parameter used to trim vocabulary.
+ | language   | dataset                     | dataset_column   | dataset_name   | dataset_split   | target_vocab_size   |   min_frequency |
+|:-----------|:----------------------------|:-----------------|:---------------|:----------------|:--------------------|----------------:|
+| ja         | vocabtrimmer/mc4_validation | text             | ja             | validation      |                     |               2 |