Edit model card

Model Card for increasing_digit_fine_tune

This is a fine-tune of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on ~20K sequences of increasing digits. The idea is to get an LLM that generates unstructured digit sequences (vaguely increasing) so that we can set up toy RLHF experiments with ground-truth reward models. To use a reward model to drive the generation of even digits, we can use this pre-trained digit-generating-LLM as our base-model.

The main reason for this is that LLMs are heavily biased towards language generation (obviously), so this fine-tune will start you off with a digit-generation model (and then you can test out RLHF pipelines to wrangle those digits into a format of your choosing).

Model Details

The model began life as a clone of TinyLlama/TinyLlama-1.1B-Chat-v1.0. So the architecture/training background is identical to that model.

Model Description

Uses

The intention is to test out RLHF pipelines on digit generation. Because we can easily write language parsers for sequential digit scoring, it's much easier to tell whether or not the model is learning from the reward function.

Downloads last month
4
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.