Model Card for increasing_digit_fine_tune

This is a fine-tune of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on ~20K sequences of increasing digits. The idea is to get an LLM that generates unstructured digit sequences (vaguely increasing) so that we can set up toy RLHF experiments with ground-truth reward models. To use a reward model to drive the generation of even digits, we can use this pre-trained digit-generating-LLM as our base-model.

The main reason for this is that LLMs are heavily biased towards language generation (obviously), so this fine-tune will start you off with a digit-generation model (and then you can test out RLHF pipelines to wrangle those digits into a format of your choosing).

Model Details

The model began life as a clone of TinyLlama/TinyLlama-1.1B-Chat-v1.0. So the architecture/training background is identical to that model.

Model Description

License: Apache 2 Imported from TinyLlama
Finetuned from model TinyLlama/TinyLlama-1.1B-Chat-v1.0

Uses

The intention is to test out RLHF pipelines on digit generation. Because we can easily write language parsers for sequential digit scoring, it's much easier to tell whether or not the model is learning from the reward function.