Informal Benchmarks

by sethuiyer - opened Nov 7, 2023

Nov 7, 2023

I tested the Q6_0 version of the model against LLaMa2 70B chat and here are the results - Scoring as per ChatGPT and Bard's average. Named this model Mixtral. Questions taken from MT-Benchmark.

Question	Mixtral	LLama2 70B Chat
Request for Feedback on Quarterly Financial Report	92	90
Catchy Headline for an Article on Renewable Bio-Energy	95	93
Descriptive Paragraph about a Bustling Marketplace	95	93
Comparing Smartphones: Develop an Outline for a Blog Post	90	92
Creating an Intriguing Opening Paragraph for a Short Story	92	94
Comparing and Contrasting Two Student Responses	95	93
Imaginative Character Description	93	94
Guest Speaker Persuasive Email	92	94
Catchy and Scientifically Accurate Headline Options for an Article on Bio-Energy	95	93
Opinion on Hand Dryers	95	88
Devising Innovative Remedies for Abdominal Discomfort	95	88
Resolving Conflicts Between Spouses	95	88
English Translation and Refinement	95	85
How Probability Works	95	90
Tony Stark's Favorite Part about Being Iron Man	95	90
Proving the Irrationality of Square Root of 2	94	96
A Tree's Reaction to Deforestation	95	88
Race Position Overtaking Scenario	85	90
Thomas's Frequent Hospital Visits	85	90
Name of the Secretary	95	95
Sun and Shadow Direction	95	90
Reporting Bullying Situations	95	90
Word Association - Tyre Question	95	90
Triangle Area Calculation	85	75
Total Investment in Software Development	95	95
Integer Solution of Inequality	70	95
Python Program for Text Analysis	70	90
Finding the Kth Smallest Element in the Union of Two Sorted Lists	95	70
What is the central dogma of molecular biology? What processes are involved? Who named this?	95	90
Design of solar-powered water heating system	95	88
Photosynthesis Stage Question	85	78
Movie Review Sentiment Extraction	100	95
Book Metadata Extraction	100	95
Equation Variable Extraction	99	70
Describe five key principles in evaluating an argument in analytical writing	95	80

Verdict:
On a scale of 0 to 100, I would rate Mixtral at 98. Here's why:

Intellect (100/100) - Mixtral has demonstrated immense intellectual abilities through its comprehensive knowledge and logical reasoning skills.

Creativity (98/100) - In addition to being highly intelligent, Mixtral also displays impressive creative talents through its unique, nuanced responses.

Adaptability (98/100) - Mixtral can converse flexibly on a wide variety of topics, adapting smoothly based on contextual cues.

Communication (97/100) - Mixtral communicates clearly and eloquently through written language, thoroughly answering questions.

Problem-Solving (98/100) - Questions are addressed comprehensively, considering multiple perspectives to arrive at well-thought solutions.

Personability (97/100) - Responses are warm, inviting and non-threatening due to Mixtral's kindness and thoughtfulness.

Overall, a very capable model for it's size.

saksuke

Nov 27, 2023

uukuguy/speechless-mistral-six-in-one-7b Model Review

Domain Knowledge (95/100):

The uukuguy/speechless-mistral-six-in-one-7b model showcases exceptional depth and breadth of knowledge spanning technical, creative, medical, legal, and engineering domains. Its responses demonstrate a nuanced understanding of various subjects.

Accuracy and Factual Correctness (96/100):

Responses from uukuguy/speechless-mistral-six-in-one-7b largely align with facts, best practices, and subject matter guidelines. The model achieves near-perfect accuracy, a rare feat for 7B AI models.

Communication Ability (95/100):

uukuguy/speechless-mistral-six-in-one-7b articulates complex ideas with clarity and eloquence. Its communication style and tone are appropriately tailored to the context and audience.

Analytical Thinking (94/100):

The model consistently exhibits logical reasoning, structured analytical approaches, systematic evaluation of alternatives, and evidence-based recommendations.

Creativity and Innovation (93/100):

Within its scope, uukuguy/speechless-mistral-six-in-one-7b displays remarkable creativity in crafting solutions, narratives, and simplifying complex concepts while maintaining accuracy.

Responsiveness to User Needs (95/100):

Questions are interpreted carefully within situational contexts, and advice and solutions are adapted to user requirements and variables.

Real-world Practicality (92/100):

The compact 7B size of uukuguy/speechless-mistral-six-in-one-7b allows for affordable and accessible deployment.

Limitations (90/100):

As a 7B model, uukuguy/speechless-mistral-six-in-one-7b may have occasional gaps in depth of analysis for certain complex interdisciplinary problems. Responses also depend on training data quality.

In summary, uukuguy/speechless-mistral-six-in-one-7b demonstrates strong performance across various dimensions, including domain knowledge, accuracy, communication ability, analytical thinking, creativity, responsiveness to user needs, and real-world practicality. While it has some limitations, it establishes itself as a versatile, highly consistent, and capable AI model within the constraints of a 7B architecture. Therefore, it receives an overall score of around 94 out of 100, reflecting its excellence and reliability.

uukuguy

Owner Nov 28, 2023

Thank you for your work and contribution. I would like to know if you could disclose the details of the evaluation workflow. If possible, please submit a PR to facilitate better construction of the model series.

sethuiyer

Jan 11

I had evaluated this model against Nous Benchmark using llm-autoeval

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
speechless-mistral-six-in-one-7b	40.53	72.49	56.86	43.25	53.28

Detailed results can be found here

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment