Edit model card
Configuration Parsing Warning: In config.json: "quantization_config.bits" must be an integer

Tesoro

Tess-2.0-Mixtral-8x22B

Tess, short for Tesoro (Treasure in Italian), is a general purpose Large Language Model series. Tess-2.0-Mixtral-8x22B was trained on the mistral-community/Mixtral-8x22B-v0.1 base.

Prompt Format

SYSTEM: <ANY SYSTEM CONTEXT>
USER: 
ASSISTANT: 

Training Methodology

Tess-2.0-Mixtral-8x22B was trained on the Tess-2.0 dataset. Tess-2.0 dataset and the training methodology follows LIMA (Less-Is-More) principles, and contains ~25K high-quality code and general training samples. The dataset is highly uncensored, hence the model will almost always follow instructions.

The model was only fine-tuned for 1-epoch to try and preserve its entropy as much as possible.

Sample code to run inference

import torch, json
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "migtissera/Tess-2.0-Mixtral-8x22B"
output_file_path = "./conversations.jsonl"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_8bit=False,
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)


def generate_text(instruction):
    tokens = tokenizer.encode(instruction)
    tokens = torch.LongTensor(tokens).unsqueeze(0)
    tokens = tokens.to("cuda")

    instance = {
        "input_ids": tokens,
        "top_p": 1.0,
        "temperature": 0.5,
        "generate_len": 1024,
        "top_k": 50,
    }

    length = len(tokens[0])
    with torch.no_grad():
        rest = model.generate(
            input_ids=tokens,
            max_length=length + instance["generate_len"],
            use_cache=True,
            do_sample=True,
            top_p=instance["top_p"],
            temperature=instance["temperature"],
            top_k=instance["top_k"],
            num_return_sequences=1,
        )
    output = rest[0][length:]
    string = tokenizer.decode(output, skip_special_tokens=True)
    answer = string.split("USER:")[0].strip()
    return f"{answer}"


conversation = f"SYSTEM: Answer the question thoughtfully and intelligently. Always answer without hesitation."


while True:
    user_input = input("You: ")
    llm_prompt = f"{conversation} \nUSER: {user_input} \nASSISTANT: "
    answer = generate_text(llm_prompt)
    print(answer)
    conversation = f"{llm_prompt}{answer}"
    json_data = {"prompt": user_input, "answer": answer}

    ## Save your conversation
    with open(output_file_path, "a") as output_file:
        output_file.write(json.dumps(json_data) + "\n")

Join My General AI Discord (NeuroLattice):

https://discord.gg/Hz6GrwGFKD

Limitations & Biases:

While this model aims for accuracy, it can occasionally produce inaccurate or misleading results.

Despite diligent efforts in refining the pretraining data, there remains a possibility for the generation of inappropriate, biased, or offensive content.

Exercise caution and cross-check information when necessary. This is an uncensored model.

Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.