Edit model card

Llama-3-70B-Synthia-v3.5

Llama-3-70B-Synthia-v3.5 (Synthetic Intelligent Agent) is a general purpose Large Language Model (LLM). It was trained on the Synthia-v3.5 dataset that contains the varied system contexts, plus some other publicly available datasets.

It has been fine-tuned for instruction following as well as having long-form conversations.

Compute for Llama-3-70B-Synthia-v3.5 was sponsored by KindoAI.


Synthia


Evaluation

We evaluated Llama-3-70B-Synthia-v3.5 on a wide range of tasks using Language Model Evaluation Harness from EleutherAI.

Here are the results on metrics used by HuggingFaceH4 Open LLM Leaderboard. Section to follow.

Task Metric Value
arc_challenge acc_norm
hellaswag acc_norm
mmlu acc_norm
truthfulqa_mc mc2
Total Average -

Sample code to run inference

import torch, json
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "/home/migel/Llama-3-70B-Synthia-v3.5"
output_file_path = "/home/migel/conversations.jsonl"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_4bit=False,
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

def generate_text(instruction):
    tokens = tokenizer.encode(instruction)
    tokens = torch.LongTensor(tokens).unsqueeze(0)
    tokens = tokens.to("cuda")

    instance = {
        "input_ids": tokens,
        "top_p": 1.0,
        "temperature": 0.75,
        "generate_len": 1024,
        "top_k": 50,
    }

    length = len(tokens[0])
    with torch.no_grad():
        rest = model.generate(
            input_ids=tokens,
            max_length=length + instance["generate_len"],
            use_cache=True,
            do_sample=True,
            top_p=instance["top_p"],
            temperature=instance["temperature"],
            top_k=instance["top_k"],
            num_return_sequences=1,
            pad_token_id=tokenizer.eos_token_id,
        )
    output = rest[0][length:]
    string = tokenizer.decode(output, skip_special_tokens=True)
    return f"{string}"

conversation = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are Synthia, a helful, female AI assitant. You always provide detailed answers without hesitation.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"""

while True:
    user_input = input("You: ")
    llm_prompt = f"{conversation}{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
    answer = generate_text(llm_prompt)
    print(answer)

    conversation = f"{llm_prompt}{answer}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"

    json_data = {"prompt": user_input, "answer": answer}

    with open(output_file_path, "a") as output_file:
        output_file.write(json.dumps(json_data) + "\n")

Join My General AI Discord (NeuroLattice):

https://discord.gg/Hz6GrwGFKD

Limitations & Biases:

While this model aims for accuracy, it can occasionally produce inaccurate or misleading results.

Despite diligent efforts in refining the pretraining data, there remains a possibility for the generation of inappropriate, biased, or offensive content.

Exercise caution and cross-check information when necessary. This is an uncensored model.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 35.20
IFEval (0-Shot) 60.76
BBH (3-Shot) 49.12
MATH Lvl 5 (4-Shot) 18.96
GPQA (0-shot) 18.34
MuSR (0-shot) 23.39
MMLU-PRO (5-shot) 40.65
Downloads last month
2,110
Safetensors
Model size
70.6B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results