Text2Text Generation
Transformers
Safetensors
101 languages
t5
Inference Endpoints
text-generation-inference

Will bigger models be made available or models for specific languages?

#13
by nonetrix - opened

I can run models all the way up to 120B parameters with my system when it is in GGUF and quantized to 2 bits with 64GBs of RAM, so it would be nice to see bigger models for sure. Personally, I'd prefer MoE type models perhaps, just because inference on CPU can get really painful with gigantic models with speeds like 0.5 tokens a second at 120B which I can only tolerate barley. Also what about more specialized models? Like for my use cases I just want it to be able to speak Japanese and English. Also, how good is this models understanding of Japanese grammar? That is my main usecase I want but no model so far has been able to deliver really besides GPT 3 etc. I want to be able to prompt it with something like this as just a very simple example but it could be expanded to other languages

Please break down the grammar of the following sentence: 私の名前は田中です

Expected output:

  • 私 (わたし) = "I" or "me"
  • の = particle indicating possession or association
  • 名前 (なまえ) = "name"
  • は = topic marker, indicating the topic of the sentence
  • 田中 (たなか) = a common Japanese surname
  • です = copula, equivalent to "is" in English
Cohere For AI org
edited Feb 20

Hi @nonetrix

We fine-tuned mt5 models as part of the Aya project because it was pretrained on 101 languages for which we wanted to do instruction tuning as part of the project.
The largest mt5 model is 13B which is also the size of the aya-101 model so this is the largest model size we support.

In terms of building specialised models, you can try out the aya-101 model for your use case...in case you feel you're not getting the desired results...you can fine-tune it further for your use case in Japanese and create a specialised version of it that would perform well for your specific use case.
The core idea behind having the aya-101 model is to have a general purpose model that's capable of following instructions in 101 languages. And then people can use it to build specialized models for their use cases if required.

Hope that resolves your queries! Let me know in case you have any other questions otherwise feel free to close this discussion.

shivi changed discussion status to closed

Sign up or log in to comment