ladybird-base-7B-v8 / README.md
bobofrut's picture
Upload to fix corrupted files
81fc1f2
|
raw
history blame
5.81 kB
metadata
license: apache-2.0
language:
  - en
tags:
  - mistral
  - text-generation-inference
  - conversational
  - finetuned

Ladybird-base-7B-v8

Welcome to the repository of Ladybird-base-7B-v8, a cutting-edge Large Language Model (LLM) developed as a result of extensive research and learning in the field of Artificial Intelligence (AI), particularly focusing on LLMs. This model represents a significant milestone in my journey to understand and contribute to the advancement of AI technologies.

About the Creator

As an avid learner and researcher of AI, I embarked on the journey to not only understand but also to contribute to the field of Large Language Models. Building and fine-tuning my own models allowed me to deeply engage with the intricacies of AI, culminating in the development of the Ladybird-base-7B-v8. This project is a testament to my dedication to learning and my passion for pushing the boundaries of what AI models can achieve.

Model Overview

Ladybird-base-7B-v8 is based on the Mistral architecture, which is known for its efficiency and effectiveness in handling complex language understanding and generation tasks. The model incorporates several innovative architecture choices to enhance its performance:

  • Grouped-Query Attention: Optimizes attention mechanisms by grouping queries, reducing computational complexity while maintaining model quality.
  • Sliding-Window Attention: Improves the model's ability to handle long-range dependencies by focusing on relevant segments of input, enhancing understanding and coherence.
  • Byte-fallback BPE Tokenizer: Offers robust tokenization by combining the effectiveness of Byte-Pair Encoding (BPE) with a fallback mechanism for out-of-vocabulary bytes, ensuring comprehensive language coverage.

Instruction Format

To fully leverage the capabilities of Ladybird-base-7B-v8, especially its instruction fine-tuning feature, users are advised to follow a specific instruction format. This format ensures that prompts are effectively processed, resulting in accurate and context-aware responses from the model. Here's how to construct your prompts:

<s>[INST] Your specific instruction here [/INST]Additional context or instruction details go here</s>
  • Begin your instruction with <s>[INST] followed by your specific request or command.
  • Close your instruction with [/INST]. If additional context or details are necessary, include them following the closing tag.
  • The instruction sequence should be terminated with an end-of-sentence token </s>.

Example

<s>[INST] What are the primary benefits of using renewable energy sources? [/INST]You are an AI trained to provide comprehensive and concise answers</s>

Eval results

Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none None acc 0.8272 ± 0.0106
truthfulqa_mc2 2 none 0 acc 0.7736 ± 0.0139
truthfulqa_mc1 2 none 0 acc 0.6242 ± 0.0170
stem N/A none None acc 0.5109 ± 0.0085
- abstract_algebra 0 none None acc 0.2900 ± 0.0456
- anatomy 0 none None acc 0.5852 ± 0.0426
- astronomy 0 none None acc 0.6908 ± 0.0376
- college_biology 0 none None acc 0.6875 ± 0.0388
- college_chemistry 0 none None acc 0.4000 ± 0.0492
- college_computer_science 0 none None acc 0.5300 ± 0.0502
- college_mathematics 0 none None acc 0.2600 ± 0.0441
- college_physics 0 none None acc 0.4314 ± 0.0493
- computer_security 0 none None acc 0.7100 ± 0.0456
- conceptual_physics 0 none None acc 0.5702 ± 0.0324
- electrical_engineering 0 none None acc 0.5586 ± 0.0414
- elementary_mathematics 0 none None acc 0.4259 ± 0.0255
- high_school_biology 0 none None acc 0.7710 ± 0.0239
- high_school_chemistry 0 none None acc 0.4483 ± 0.0350
- high_school_computer_science 0 none None acc 0.7000 ± 0.0461
- high_school_mathematics 0 none None acc 0.3259 ± 0.0286
- high_school_physics 0 none None acc 0.3179 ± 0.0380
- high_school_statistics 0 none None acc 0.4491 ± 0.0339
- machine_learning 0 none None acc 0.5000 ± 0.0475
hellaswag 1 none None acc 0.7010 ± 0.0046
none None acc_norm 0.8763 ± 0.0033
gsm8k 3 strict-match 5 exact_match 0.7650 ± 0.0117
flexible-extract 5 exact_match 0.7695 ± 0.0116
arc_challenge 1 none None acc 0.6749 ± 0.0137
none None acc_norm 0.6800 ± 0.0136

Contact


rafal.jackiewicz@gmail.com