Edit model card

TinyMistral-6x248M

TinyMistral-6x248M is a Mixure of Experts (MoE) made with the following models using LazyMergekit:

The resulting model is then pre-trained on 600,000 examples of nampdn-ai/mini-peS2o.

We don't recommend using the Inference API as the model has serious performance degradation.

Recommended inference parameters

do_sample: true
temperature: 0.2
top_p: 0.14
top_k: 12
repetition_penalty: 1.15

🧩 Configuration

base_model: Locutusque/TinyMistral-248M-v2.5
experts:
  - source_model: Locutusque/TinyMistral-248M-v2
    positive_prompts:
      - "An emerging trend in global economics is"
      - "TITLE: The Next Generation of Internet Connectivity"
      - "begin a comprehensive analysis on the sociopolitical effects of"
    negative_prompts:
      - "Code a simple"
      - "Explain the Krebs cycle in detail"
      - "Compose a sonnet about"

  - source_model: Locutusque/TinyMistral-248M-v2.5
    positive_prompts:
      - "Advanced C++ memory management techniques"
      - "C# asynchronous programming best practices"
      - "AI's role in predictive analytics"
      - "textbook review on machine learning algorithms"
      - "## Exercise: Design a C# interface for a CRM system"
      - "## Solution: Optimize an AI-powered recommendation engine"
    negative_prompts:
      - "Narrate the story of"
      - "The ethical considerations in"
      - "Review the latest art exhibition by"
  
  - source_model: Locutusque/TinyMistral-248M-v2.5-Instruct
    positive_prompts:
      - "What is the chemical formula for photosynthesis?"
      - "Identification of a new mineral found on Mars"
      - "physics: Explaining the concept of relativity"
      - "Solve for x using differential equations:"
      - "history: Analyze the causes of the French Revolution"
    negative_prompts:
      - "Devise a business plan for"
      - "The evolution of culinary arts"
      - "Orchestrate a piece for a string quartet"
  
  - source_model: jtatman/tinymistral-v2-pycoder-instruct-248m
    positive_prompts:
      - "Write a Python program for facial recognition"
      - "Explain dynamic typing in programming languages"
      - "algorithm development for efficient data sorting"
    negative_prompts:
      - "Who was the first Emperor of Rome?"
      - "Discuss the political dynamics in"
      - "Provide a proof for Fermat's Last Theorem"
      - "physics: The principles of thermodynamics"
  
  - source_model: Felladrin/TinyMistral-248M-SFT-v4
    positive_prompts:
      - "Escreba sobre a influΓͺncia da mΓΊsica no Brasil"
      - "Voici un guide pour les voyageurs en France"
      - "Para entender la polΓ­tica de MΓ©xico, se debe considerar"
      - "Cuales son los efectos de la globalizaciΓ³n en Argentina"
      - "Welche gesellschaftlichen VerΓ€nderungen gibt es in Deutschland"
      - "If you had to imagine a utopian city, what would be its core values?"
    negative_prompts:
      - "Calculate the integral of"
      - "Describe the process of cell division"
      - "Review the latest advancements in quantum computing"

  - source_model: Locutusque/TinyMistral-248M-v2-Instruct
    positive_prompts:
      - "Write an essay on the evolution of international trade laws"
      - "What are the key components of a sustainable urban ecosystem?"
      - "instruct on effective negotiation techniques in diplomacy"
      - "How does cognitive bias affect decision making in high-pressure environments?"
      - "Identify the architectural significance of the Sydney Opera House"
    negative_prompts:
      - "Develop a script to automate"
      - "Understanding inheritance in object-oriented programming"
      - "philosophy of existentialism in contemporary society"

πŸ’» Usage

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "M4-ai/TinyMistral-6x248M"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
700
Safetensors
Model size
1B params
Tensor type
F32
Β·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Merge of

Dataset used to train M4-ai/TinyMistral-6x248M

Space using M4-ai/TinyMistral-6x248M 1