How to get training accuracy metric?

#16
by opyate - opened

Hey guys,

After running training, it's common to see accuracy metrics, e.g. F1-score. I'm not seeing anything in the deepspeed output: https://pastebin.com/GiVs6EGc

I've checked the CLI, and it's not an option I'm missing. Is there a way to get these values?

Kind regards,

Databricks org

Metrics are logged to tensorboard. You can see the notebook invokes the TB display in a Databricks notebook but you can run TB on that logs dir anywhere.

HF will also log metrics to MLflow if you set some env variables (and same for weights and biases, etc) which also surfaces metrics, and would show in Databricks automatically

Thanks for that.

It looks like I lose the "runs" folder when the instance terminates.

The notebook also says:

Your log directory might be ephemeral to the cluster, which will be deleted after cluster termination or restart. You can choose a log directory under /dbfs/ to persist your logs in DBFS.

I reset the output dir to:

tensorboard_display_dir = f"{dbfs_output_dir}/runs"

But it seems to be empty.

EDIT: I fixed this by outputting to local_disk0 first (like in the original code, and this works), and then moving the runs data to DBFS after, so I can interrogate it later:

# persist the tensorboard data
!mkdir -p /dbfs/dolly_training/tensorboards/$checkpoint_dir_name
!cp -R /local_disk0/dolly_training/$checkpoint_dir_name/runs /dbfs/dolly_training/tensorboards/$checkpoint_dir_name

Here are some screenshots of tensorboard for my 3b run. Not sure where to find accuracy scores (like F1, etc).

Screenshot from 2023-05-11 11-23-10.png

Screenshot from 2023-05-11 11-23-47.png

Databricks org

Yes, you can write to a permanent storage location like /dbfs/... ; MLflow will also persist metrics, including tensorboard logs, for you
I don't think F1 makes sense for a causal language model. What would it mean or measure?

Not necessarily F1, but an equivalent score that gauges accuracy during validation?

Databricks org

That is more or less what (cross-entropy) loss is measuring. The task is predicting the next word over and over, and while accuracy is coherent, I'm not sure it's as useful in assessing how confidently correct those words are.

Thanks for the answers, Sean!

opyate changed discussion status to closed

Sign up or log in to comment