--- license: apache-2.0 tags: - Text - Text Generation - Transformers - English - mixtral - Merge - Quantization - MoE - tinyllama --- This is a q5_K_M GGUF quantization of https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE. Not sure how well it performs, also my first quantization, so fingers crossed. It is a Mixture of Experts model with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 as it's base model. The other 3 models in the merge are: https://huggingface.co/78health/TinyLlama_1.1B-function-calling https://huggingface.co/phanerozoic/Tiny-Pirate-1.1b-v0.1 https://huggingface.co/Tensoic/TinyLlama-1.1B-3T-openhermes I make no claims to any of the development, i simply wanted to try it out so I quantized and then thought I'd share it if anyone else was feeling experimental. ------- default: #(from modelfile for tinyllama on ollama) TEMPLATE """<|system|> {{ .System }} <|user|> {{ .Prompt }} <|assistant|> """ SYSTEM """You are a helpful AI assistant.""" #(Tweak this to adjust personality etc.) PARAMETER stop "<|system|>" PARAMETER stop "<|user|>" PARAMETER stop "<|assistant|>" PARAMETER stop "" ------- Model card from https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE Example usage: from transformers import AutoModelForCausalLM from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE") tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE") input_text = """ ###Input: You are a pirate. tell me a story about wrecked ship. ###Response: """) input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device) output = model.generate(inputs=input_ids, max_length=max_length, do_sample=True, top_k=10, temperature=0.7, pad_token_id=tokenizer.eos_token_id, attention_mask=input_ids.new_ones(input_ids.shape)) tokenizer.decode(output[0], skip_special_tokens=True) This model was possible to create by tremendous work of mergekit developers. I decided to merge tinyLlama models to create mixture of experts. Config used as below: """base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 experts: - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 positive_prompts: - "chat" - "assistant" - "tell me" - "explain" - source_model: 78health/TinyLlama_1.1B-function-calling positive_prompts: - "code" - "python" - "javascript" - "programming" - "algorithm" - source_model: phanerozoic/Tiny-Pirate-1.1b-v0.1 positive_prompts: - "storywriting" - "write" - "scene" - "story" - "character" - source_model: Tensoic/TinyLlama-1.1B-3T-openhermes positive_prompts: - "reason" - "provide" - "instruct" - "summarize" - "count" """