Model Card for increasing_digit_fine_tune

This is a fine-tune of my digit fine tune which is itself a fine-tune of TinyLlama/TinyLlama-1.1B-Chat-v1.0. This model takes the existing digit-generation model and applies very brief supervised fine-tuning on even digit sequences (multiples of 2). This paves the way for RL with a ground-truth reward function for increasing even digits.

Model Details

The model began life as a clone of TinyLlama/TinyLlama-1.1B-Chat-v1.0, and was then fine-tuned for ~20K steps on randomly generated sequences of digits resulting in this model. Finally, the resulting model was fine-tuned on a couple hundred sequences of even digits.

Model Description

License: Apache 2 Imported from TinyLlama
Finetuned from model TinyLlama/TinyLlama-1.1B-Chat-v1.0

Uses

The intention is to test out RLHF pipelines on digit generation. Because we can easily write language parsers for sequential digit scoring, it's much easier to tell whether or not the model is learning from the reward function.