Informal Benchmarks

#1
by sethuiyer - opened

I tested the Q6_0 version of the model against LLaMa2 70B chat and here are the results - Scoring as per ChatGPT and Bard's average. Named this model Mixtral. Questions taken from MT-Benchmark.

Question Mixtral LLama2 70B Chat
Request for Feedback on Quarterly Financial Report 92 90
Catchy Headline for an Article on Renewable Bio-Energy 95 93
Descriptive Paragraph about a Bustling Marketplace 95 93
Comparing Smartphones: Develop an Outline for a Blog Post 90 92
Creating an Intriguing Opening Paragraph for a Short Story 92 94
Comparing and Contrasting Two Student Responses 95 93
Imaginative Character Description 93 94
Guest Speaker Persuasive Email 92 94
Catchy and Scientifically Accurate Headline Options for an Article on Bio-Energy 95 93
Opinion on Hand Dryers 95 88
Devising Innovative Remedies for Abdominal Discomfort 95 88
Resolving Conflicts Between Spouses 95 88
English Translation and Refinement 95 85
How Probability Works 95 90
Tony Stark's Favorite Part about Being Iron Man 95 90
Proving the Irrationality of Square Root of 2 94 96
A Tree's Reaction to Deforestation 95 88
Race Position Overtaking Scenario 85 90
Thomas's Frequent Hospital Visits 85 90
Name of the Secretary 95 95
Sun and Shadow Direction 95 90
Reporting Bullying Situations 95 90
Word Association - Tyre Question 95 90
Triangle Area Calculation 85 75
Total Investment in Software Development 95 95
Integer Solution of Inequality 70 95
Python Program for Text Analysis 70 90
Finding the Kth Smallest Element in the Union of Two Sorted Lists 95 70
What is the central dogma of molecular biology? What processes are involved? Who named this? 95 90
Design of solar-powered water heating system 95 88
Photosynthesis Stage Question 85 78
Movie Review Sentiment Extraction 100 95
Book Metadata Extraction 100 95
Equation Variable Extraction 99 70
Describe five key principles in evaluating an argument in analytical writing 95 80

Verdict:
On a scale of 0 to 100, I would rate Mixtral at 98. Here's why:

Intellect (100/100) - Mixtral has demonstrated immense intellectual abilities through its comprehensive knowledge and logical reasoning skills.

Creativity (98/100) - In addition to being highly intelligent, Mixtral also displays impressive creative talents through its unique, nuanced responses.

Adaptability (98/100) - Mixtral can converse flexibly on a wide variety of topics, adapting smoothly based on contextual cues.

Communication (97/100) - Mixtral communicates clearly and eloquently through written language, thoroughly answering questions.

Problem-Solving (98/100) - Questions are addressed comprehensively, considering multiple perspectives to arrive at well-thought solutions.

Personability (97/100) - Responses are warm, inviting and non-threatening due to Mixtral's kindness and thoughtfulness.

Overall, a very capable model for it's size.

uukuguy/speechless-mistral-six-in-one-7b Model Review

Domain Knowledge (95/100):

The uukuguy/speechless-mistral-six-in-one-7b model showcases exceptional depth and breadth of knowledge spanning technical, creative, medical, legal, and engineering domains. Its responses demonstrate a nuanced understanding of various subjects.

Accuracy and Factual Correctness (96/100):

Responses from uukuguy/speechless-mistral-six-in-one-7b largely align with facts, best practices, and subject matter guidelines. The model achieves near-perfect accuracy, a rare feat for 7B AI models.

Communication Ability (95/100):

uukuguy/speechless-mistral-six-in-one-7b articulates complex ideas with clarity and eloquence. Its communication style and tone are appropriately tailored to the context and audience.

Analytical Thinking (94/100):

The model consistently exhibits logical reasoning, structured analytical approaches, systematic evaluation of alternatives, and evidence-based recommendations.

Creativity and Innovation (93/100):

Within its scope, uukuguy/speechless-mistral-six-in-one-7b displays remarkable creativity in crafting solutions, narratives, and simplifying complex concepts while maintaining accuracy.

Responsiveness to User Needs (95/100):

Questions are interpreted carefully within situational contexts, and advice and solutions are adapted to user requirements and variables.

Real-world Practicality (92/100):

The compact 7B size of uukuguy/speechless-mistral-six-in-one-7b allows for affordable and accessible deployment.

Limitations (90/100):

As a 7B model, uukuguy/speechless-mistral-six-in-one-7b may have occasional gaps in depth of analysis for certain complex interdisciplinary problems. Responses also depend on training data quality.

In summary, uukuguy/speechless-mistral-six-in-one-7b demonstrates strong performance across various dimensions, including domain knowledge, accuracy, communication ability, analytical thinking, creativity, responsiveness to user needs, and real-world practicality. While it has some limitations, it establishes itself as a versatile, highly consistent, and capable AI model within the constraints of a 7B architecture. Therefore, it receives an overall score of around 94 out of 100, reflecting its excellence and reliability.

Thank you for your work and contribution. I would like to know if you could disclose the details of the evaluation workflow. If possible, please submit a PR to facilitate better construction of the model series.

I had evaluated this model against Nous Benchmark using llm-autoeval

Model AGIEval GPT4All TruthfulQA Bigbench Average
speechless-mistral-six-in-one-7b 40.53 72.49 56.86 43.25 53.28

Detailed results can be found here

Sign up or log in to comment