|
--- |
|
license: apache-2.0 |
|
tags: |
|
- Text |
|
- Text Generation |
|
- Transformers |
|
- English |
|
- mixtral |
|
- Merge |
|
- Quantization |
|
- MoE |
|
- tinyllama |
|
--- |
|
|
|
This is a q5_K_M GGUF quantization of https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE. |
|
|
|
Not sure how well it performs, also my first quantization, so fingers crossed. |
|
|
|
It is a Mixture of Experts model with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 as it's base model. |
|
|
|
The other 3 models in the merge are: |
|
|
|
https://huggingface.co/78health/TinyLlama_1.1B-function-calling |
|
|
|
https://huggingface.co/phanerozoic/Tiny-Pirate-1.1b-v0.1 |
|
|
|
https://huggingface.co/Tensoic/TinyLlama-1.1B-3T-openhermes |
|
|
|
I make no claims to any of the development, i simply wanted to try it out so I quantized and then thought I'd share it if anyone else was feeling experimental. |
|
|
|
------- |
|
|
|
default: #(from modelfile for tinyllama on ollama) |
|
|
|
TEMPLATE """<|system|> |
|
{{ .System }}</s> |
|
<|user|> |
|
{{ .Prompt }}</s> |
|
<|assistant|> |
|
""" |
|
SYSTEM """You are a helpful AI assistant.""" #(Tweak this to adjust personality etc.) |
|
|
|
PARAMETER stop "<|system|>" |
|
PARAMETER stop "<|user|>" |
|
PARAMETER stop "<|assistant|>" |
|
PARAMETER stop "</s>" |
|
|
|
------- |
|
|
|
Model card from https://huggingface.co/s3nh/TinyLLama-4x1.1B-MoE |
|
|
|
Example usage: |
|
|
|
from transformers import AutoModelForCausalLM |
|
from transformers import AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE") |
|
tokenizer = AutoTokenizer.from_pretrained("s3nh/TinyLLama-1.1B-MoE") |
|
|
|
input_text = """ |
|
###Input: You are a pirate. tell me a story about wrecked ship. |
|
###Response: |
|
""") |
|
|
|
input_ids = tokenizer.encode(input_text, return_tensors='pt').to(device) |
|
output = model.generate(inputs=input_ids, |
|
max_length=max_length, |
|
do_sample=True, |
|
top_k=10, |
|
temperature=0.7, |
|
pad_token_id=tokenizer.eos_token_id, |
|
attention_mask=input_ids.new_ones(input_ids.shape)) |
|
tokenizer.decode(output[0], skip_special_tokens=True) |
|
|
|
This model was possible to create by tremendous work of mergekit developers. I decided to merge tinyLlama models to create mixture of experts. Config used as below: |
|
|
|
"""base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
experts: |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: |
|
- "chat" |
|
- "assistant" |
|
- "tell me" |
|
- "explain" |
|
- source_model: 78health/TinyLlama_1.1B-function-calling |
|
positive_prompts: |
|
- "code" |
|
- "python" |
|
- "javascript" |
|
- "programming" |
|
- "algorithm" |
|
- source_model: phanerozoic/Tiny-Pirate-1.1b-v0.1 |
|
positive_prompts: |
|
- "storywriting" |
|
- "write" |
|
- "scene" |
|
- "story" |
|
- "character" |
|
- source_model: Tensoic/TinyLlama-1.1B-3T-openhermes |
|
positive_prompts: |
|
- "reason" |
|
- "provide" |
|
- "instruct" |
|
- "summarize" |
|
- "count" |
|
""" |