The kraken has awakened! A Game-Changer in LLM Flexibility and Performance!
Over the past few weeks, VAGO solutions teamed up with Cognitive Computations and HyperSpace to develop a groundbreaking architecture that redefines flexibility in combining different LLM into one model.
What Can It Do? π β Versatile Architecture: Kraken allows the seamless combination of LLMs with varying sizes, quantizations, and model architectures. It currently supports quantizations in 4-bit, 8-bit, and AWQ, with more on the way. And it runs on Hugging Face Transformers 4.40+
β Kraken Router: Utilizing a custom sequence classification model with a context length of 32k tokens, The Kraken Router directs inputs to the most suitable Expert based on their characteristics.
β Adaptability: Enhanced input formatting supports the modelβs adaptability to diverse conversational contexts.
β Extreme Versatility: Easily swap experts within Kraken for your specific use cases without retraining the entire model. For example, if you've built a Kraken for coding in Python you can upgrade your Python model without retraining the router or add a C# model by retraining the router.
β Open Source Pipeline: Weβre sharing the entire pipeline, including router creation, training, architecture setup, and Kraken inference, on JupyterNotebooks: https://github.com/cognitivecomputations/kraken
We proudly introduce the very first 2 Kraken models, that integrates top-tier LLM and Multilingual capabilities: cognitivecomputations/Kraken VAGOsolutions/Kraken-Multilingual Right now it's supported by Hugging Face transformers library. Would love to see the integration into VLM and TGWI!