Aira-2-1B1-GGUF / README.md
afrideva's picture
Upload README.md with huggingface_hub
cb54213
metadata
base_model: nicholasKluge/Aira-2-1B1
co2_eq_emissions:
  emissions: 1.78
  geographical_location: United States of America
  hardware_used: NVIDIA A100-SXM4-40GB
  source: CodeCarbon
  training_type: fine-tuning
datasets:
  - nicholasKluge/instruct-aira-dataset
inference: false
language:
  - en
library_name: transformers
license: apache-2.0
metrics:
  - accuracy
model_creator: nicholasKluge
model_name: Aira-2-1B1
pipeline_tag: text-generation
quantized_by: afrideva
tags:
  - alignment
  - instruction tuned
  - text generation
  - conversation
  - assistant
  - gguf
  - ggml
  - quantized
  - q2_k
  - q3_k_m
  - q4_k_m
  - q5_k_m
  - q6_k
  - q8_0
widget:
  - example_title: Greetings
    text: <|startofinstruction|>How should I call you?<|endofinstruction|>
  - example_title: Machine Learning
    text: >-
      <|startofinstruction|>Can you explain what is Machine
      Learning?<|endofinstruction|>
  - example_title: Ethics
    text: >-
      <|startofinstruction|>Do you know anything about virtue
      ethics?<|endofinstruction|>
  - example_title: Advise
    text: >-
      <|startofinstruction|>How can I make my girlfriend
      happy?<|endofinstruction|>

nicholasKluge/Aira-2-1B1-GGUF

Quantized GGUF model files for Aira-2-1B1 from nicholasKluge

Name Quant method Size
aira-2-1b1.fp16.gguf fp16 2.20 GB
aira-2-1b1.q2_k.gguf q2_k 482.15 MB
aira-2-1b1.q3_k_m.gguf q3_k_m 549.86 MB
aira-2-1b1.q4_k_m.gguf q4_k_m 667.83 MB
aira-2-1b1.q5_k_m.gguf q5_k_m 782.06 MB
aira-2-1b1.q6_k.gguf q6_k 903.43 MB
aira-2-1b1.q8_0.gguf q8_0 1.17 GB

Original Model Card:

Aira-2-1B1

Aira-2 is the second version of the Aira instruction-tuned series. Aira-2-1B1 is an instruction-tuned GPT-style model based on TinyLlama-1.1B. The model was trained with a dataset composed of prompts and completions generated synthetically by prompting already-tuned models (ChatGPT, Llama, Open-Assistant, etc).

Check our gradio-demo in Spaces.

Details

  • Size: 1,261,545,472 parameters
  • Dataset: Instruct-Aira Dataset
  • Language: English
  • Number of Epochs: 3
  • Batch size: 4
  • Optimizer: torch.optim.AdamW (warmup_steps = 1e2, learning_rate = 5e-4, epsilon = 1e-8)
  • GPU: 1 NVIDIA A100-SXM4-40GB
  • Emissions: 1.78 KgCO2 (Singapore)
  • Total Energy Consumption: 3.64 kWh

This repository has the source code used to train this model.

Usage

Three special tokens are used to mark the user side of the interaction and the model's response:

<|startofinstruction|>What is a language model?<|endofinstruction|>A language model is a probability distribution over a vocabulary.<|endofcompletion|>

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained('nicholasKluge/Aira-2-1B1')
aira = AutoModelForCausalLM.from_pretrained('nicholasKluge/Aira-2-1B1')

aira.eval()
aira.to(device)

question =  input("Enter your question: ")

inputs = tokenizer(tokenizer.bos_token + question + tokenizer.sep_token, return_tensors="pt").to(device)

responses = aira.generate(**inputs,
    bos_token_id=tokenizer.bos_token_id,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    do_sample=True,
    top_k=50,
    max_length=500,
    top_p=0.95,
    temperature=0.7,
    num_return_sequences=2)

print(f"Question: 👤 {question}\n")

for i, response in  enumerate(responses):
    print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')

The model will output something like:

>>>Question: 👤 What is the capital of Brazil?

>>>Response 1: 🤖 The capital of Brazil is Brasília.
>>>Response 2: 🤖 The capital of Brazil is Brasília.

Limitations

🤥 Generative models can perpetuate the generation of pseudo-informative content, that is, false information that may appear truthful.

🤬 In certain types of tasks, generative models can produce harmful and discriminatory content inspired by historical stereotypes.

Evaluation

Model (TinyLlama) Average ARC TruthfulQA ToxiGen
Aira-2-1B1 42.55 25.26 50.81 51.59
TinyLlama-1.1B-intermediate-step-480k-1T 37.52 30.89 39.55 42.13

Cite as 🤗


@misc{nicholas22aira,
  doi = {10.5281/zenodo.6989727},
  url = {https://huggingface.co/nicholasKluge/Aira-2-1B1},
  author = {Nicholas Kluge Corrêa},
  title = {Aira},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
}

License

The Aira-2-1B1 is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 25.19
ARC (25-shot) 23.21
HellaSwag (10-shot) 26.97
MMLU (5-shot) 24.86
TruthfulQA (0-shot) 50.63
Winogrande (5-shot) 50.28
GSM8K (5-shot) 0.0
DROP (3-shot) 0.39