TinyLlama-3T-1.1bee / README.md
pszemraj's picture
Update README.md
752efd1
metadata
license: apache-2.0
base_model: pszemraj/tinyllama-1.1b-3T
tags:
  - bees
  - bzz
  - honey
  - oprah winfrey
metrics:
  - accuracy
inference:
  parameters:
    max_new_tokens: 64
    do_sample: true
    renormalize_logits: true
    repetition_penalty: 1.05
    no_repeat_ngram_size: 6
    temperature: 0.9
    top_p: 0.95
    epsilon_cutoff: 0.0008
widget:
  - text: In beekeeping, the term "queen excluder" refers to
    example_title: Queen Excluder
  - text: One way to encourage a honey bee colony to produce more honey is by
    example_title: Increasing Honey Production
  - text: The lifecycle of a worker bee consists of several stages, starting with
    example_title: Lifecycle of a Worker Bee
  - text: Varroa destructor is a type of mite that
    example_title: Varroa Destructor
  - text: In the world of beekeeping, the acronym PPE stands for
    example_title: Beekeeping PPE
  - text: The term "robbing" in beekeeping refers to the act of
    example_title: Robbing in Beekeeping
  - text: |-
      Question: What's the primary function of drone bees in a hive?
      Answer:
    example_title: Role of Drone Bees
  - text: To harvest honey from a hive, beekeepers often use a device known as a
    example_title: Honey Harvesting Device
  - text: >-
      Problem: You have a hive that produces 60 pounds of honey per year. You
      decide to split the hive into two. Assuming each hive now produces at a
      70% rate compared to before, how much honey will you get from both hives
      next year?

      To calculate
    example_title: Beekeeping Math Problem
  - text: In beekeeping, "swarming" is the process where
    example_title: Swarming
pipeline_tag: text-generation
datasets:
  - BEE-spoke-data/bees-internal
language:
  - en

TinyLlama-3T-1.1bee

A grand successor to the original. This one has the following improvements:

Model description

This model is a fine-tuned version of TinyLlama-1.1b-3T on the BEE-spoke-data/bees-internal dataset.

It achieves the following results on the evaluation set:

  • Loss: 2.1640
  • Accuracy: 0.5406

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 13707
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
2.4432 0.19 50 2.3850 0.5033
2.3655 0.39 100 2.3124 0.5129
2.374 0.58 150 2.2588 0.5215
2.3558 0.78 200 2.2132 0.5291
2.2677 0.97 250 2.1828 0.5348
2.0701 1.17 300 2.1788 0.5373
2.0766 1.36 350 2.1673 0.5398
2.0669 1.56 400 2.1651 0.5402
2.0314 1.75 450 2.1641 0.5406
2.0281 1.95 500 2.1639 0.5407

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.0
  • Datasets 2.16.1
  • Tokenizers 0.15.0