metadata

license: apache-2.0
base_model: pszemraj/tinyllama-1.1b-3T
tags:
  - bees
  - bzz
  - honey
  - oprah winfrey
metrics:
  - accuracy
inference:
  parameters:
    max_new_tokens: 64
    do_sample: true
    renormalize_logits: true
    repetition_penalty: 1.05
    no_repeat_ngram_size: 6
    temperature: 0.9
    top_p: 0.95
    epsilon_cutoff: 0.0008
widget:
  - text: In beekeeping, the term "queen excluder" refers to
    example_title: Queen Excluder
  - text: One way to encourage a honey bee colony to produce more honey is by
    example_title: Increasing Honey Production
  - text: The lifecycle of a worker bee consists of several stages, starting with
    example_title: Lifecycle of a Worker Bee
  - text: Varroa destructor is a type of mite that
    example_title: Varroa Destructor
  - text: In the world of beekeeping, the acronym PPE stands for
    example_title: Beekeeping PPE
  - text: The term "robbing" in beekeeping refers to the act of
    example_title: Robbing in Beekeeping
  - text: |-
      Question: What's the primary function of drone bees in a hive?
      Answer:
    example_title: Role of Drone Bees
  - text: To harvest honey from a hive, beekeepers often use a device known as a
    example_title: Honey Harvesting Device
  - text: >-
      Problem: You have a hive that produces 60 pounds of honey per year. You
      decide to split the hive into two. Assuming each hive now produces at a
      70% rate compared to before, how much honey will you get from both hives
      next year?

      To calculate
    example_title: Beekeeping Math Problem
  - text: In beekeeping, "swarming" is the process where
    example_title: Swarming
pipeline_tag: text-generation
datasets:
  - BEE-spoke-data/bees-internal
language:
  - en

TinyLlama-3T-1.1bee

A grand successor to the original. This one has the following improvements:

start from finished 3T TinyLlama
vastly improved and expanded SoTA beekeeping dataset

Model description

This model is a fine-tuned version of TinyLlama-1.1b-3T on the BEE-spoke-data/bees-internal dataset.

It achieves the following results on the evaluation set:

Loss: 2.1640
Accuracy: 0.5406

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 2
seed: 13707
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
2.4432	0.19	50	2.3850	0.5033
2.3655	0.39	100	2.3124	0.5129
2.374	0.58	150	2.2588	0.5215
2.3558	0.78	200	2.2132	0.5291
2.2677	0.97	250	2.1828	0.5348
2.0701	1.17	300	2.1788	0.5373
2.0766	1.36	350	2.1673	0.5398
2.0669	1.56	400	2.1651	0.5402
2.0314	1.75	450	2.1641	0.5406
2.0281	1.95	500	2.1639	0.5407

Framework versions

Transformers 4.36.2
Pytorch 2.1.0
Datasets 2.16.1
Tokenizers 0.15.0