Request Fork with Modifications for Python GenAI App Development on Microsoft OS

#5
by MartialTerran - opened

Thank you for releasing these models. But, at 1.B or 3B Parameters, the coder models should be more-specialized and have a smaller token vocabulary. "vocab_size": 151936 is far too large for a 1.5B or 3B model.

I propose and request that you develop/train a Specialized 1.5B and a 3B Qwen coder model that is an expert at coding ONLY (Python and its libraries, plus HTML, JS, CSS, BASH, PowerShell and Microsoft OS related languages/features) etc. And only English language. These limitation are so that even a tiny 1.5B or 3B model can have a fair chance provide reliable service while performing local Python/GenAI/Windows etc. By unnecessarily adding support and tokens for extraneous coding languages or extraneous foreign languages that are not needed for US GenAI App development on MS Windows machines, you are CRIPPLING the model's potential for compute-efficient local edge operation on local machines. Until the smaller models are optimized for specialization, they will be mere toys having no real coding potential. The token vocabulary of over 150,000 words is excessive and inappropriate for the specialist 1.5 and 3B models. Extraneous tokens not needed to support the prescribed specialization should be excised for the specialized models (to reduce the compute needed in training and in inference, and for more effective use of the limited 1.5 B parameters).
Additionally, please provide a slimmed-down python script to local operate (train and inference) the Qwen models, with or without GPUs, and thus a a script that does not invoke the cumbersome huggingface "transformers" library which is very verbose and has a high memory overhead. Also, the tokenizer script should be standalone, and not based on calls to the inflexible "autotokenizer" from huggingface.

This line should not be required to local train and inference these models: "from transformers import AutoModelForCausalLM, AutoTokenizer"
These changes will promote further public deployment, and development and finetuning of Qwen models.

Sign up or log in to comment