--- license: apache-2.0 --- # Model Card for increasing_digit_fine_tune This is a fine-tune of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on ~20K sequences of increasing digits. The idea is to get an LLM that generates unstructured digit sequences (vaguely increasing) so that we can set up toy RLHF experiments with ground-truth reward models. To use a reward model to drive the generation of even digits, we can use this pre-trained digit-generating-LLM as our base-model. The main reason for this is that LLMs are _heavily_ biased towards _language_ generation (obviously), so this fine-tune will start you off with a digit-generation model (and then you can test out RLHF pipelines to wrangle those digits into a format of your choosing). ## Model Details The model began life as a clone of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0). So the architecture/training background is identical to that model. ### Model Description - **License: [Apache 2 Imported from TinyLlama](https://github.com/jzhang38/TinyLlama/blob/main/LICENSE)** - **Finetuned from model [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)** ## Uses The intention is to test out RLHF pipelines on digit generation. Because we can easily write language parsers for sequential digit scoring, it's much easier to tell whether or not the model is learning from the reward function.