Spaces:

deanna-emery
/

ASL-MoViNet-T5-translator

Runtime error

App Files Files Community

ASL-MoViNet-T5-translator / official /legacy /image_classification /README.md

deanna-emery

updates

93528c6 about 1 year ago

preview code

raw

history blame

7.24 kB

	# Image Classification

	Warning: the features in the `image_classification/` directory have been
	fully integrated into the [new code base](https://github.com/tensorflow/models/tree/benchmark/official/vision/modeling/backbones).

	This folder contains TF 2 model examples for image classification:

	* [MNIST](#mnist)
	* [Classifier Trainer](#classifier-trainer), a framework that uses the Keras
	compile/fit methods for image classification models, including:
	* ResNet
	* EfficientNet[^1]

	[^1]: Currently a work in progress. We cannot match "AutoAugment (AA)" in [the original version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet).
	For more information about other types of models, please refer to this
	[README file](../../README.md).

	## Before you begin
	Please make sure that you have the latest version of TensorFlow
	installed and add the models folder to your Python path.

	### ImageNet preparation

	#### Using TFDS
	`classifier_trainer.py` supports ImageNet with
	[TensorFlow Datasets (TFDS)](https://www.tensorflow.org/datasets/overview).

	Please see the following [example snippet](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/scripts/download_and_prepare.py)
	for more information on how to use TFDS to download and prepare datasets, and
	specifically the [TFDS ImageNet readme](https://github.com/tensorflow/datasets/blob/master/docs/catalog/imagenet2012.md)
	for manual download instructions.

	#### Legacy TFRecords
	Download the ImageNet dataset and convert it to TFRecord format.
	The following [script](https://github.com/tensorflow/tpu/blob/master/tools/datasets/imagenet_to_gcs.py)
	and [README](https://github.com/tensorflow/tpu/tree/master/tools/datasets#imagenet_to_gcspy)
	provide a few options.

	Note that the legacy ResNet runners, e.g. [resnet/resnet_ctl_imagenet_main.py](resnet/resnet_ctl_imagenet_main.py)
	require TFRecords whereas `classifier_trainer.py` can use both by setting the
	builder to 'records' or 'tfds' in the configurations.

	### Running on Cloud TPUs

	Note: These models will not work with TPUs on Colab.

	You can train image classification models on Cloud TPUs using
	[tf.distribute.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf.distribute.TPUStrategy?version=nightly).
	If you are not familiar with Cloud TPUs, it is strongly recommended that you go
	through the
	[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to
	create a TPU and GCE VM.

	### Running on multiple GPU hosts

	You can also train these models on multiple hosts, each with GPUs, using
	[tf.distribute.experimental.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy).

	The easiest way to run multi-host benchmarks is to set the
	[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG)
	appropriately at each host. e.g., to run using `MultiWorkerMirroredStrategy` on
	2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and
	host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker",
	"index": i}`. `MultiWorkerMirroredStrategy` will automatically use all the
	available GPUs at each host.

	## MNIST

	To download the data and run the MNIST sample model locally for the first time,
	run one of the following command:

	<details>
	```bash
	python3 mnist_main.py \
	--model_dir=$MODEL_DIR \
	--data_dir=$DATA_DIR \
	--train_epochs=10 \
	--distribution_strategy=one_device \
	--num_gpus=$NUM_GPUS \
	--download
	```
	</details>

	To train the model on a Cloud TPU, run the following command:

	<details>
	```bash
	python3 mnist_main.py \
	--tpu=$TPU_NAME \
	--model_dir=$MODEL_DIR \
	--data_dir=$DATA_DIR \
	--train_epochs=10 \
	--distribution_strategy=tpu \
	--download
	```
	</details>

	Note: the `--download` flag is only required the first time you run the model.

	## Classifier Trainer
	The classifier trainer is a unified framework for running image classification
	models using Keras's compile/fit methods. Experiments should be provided in the
	form of YAML files, some examples are included within the configs/examples
	folder. Please see [configs/examples](./configs/examples) for more example
	configurations.

	The provided configuration files use a per replica batch size and is scaled
	by the number of devices. For instance, if `batch size` = 64, then for 1 GPU
	the global batch size would be 64 * 1 = 64. For 8 GPUs, the global batch size
	would be 64 * 8 = 512. Similarly, for a v3-8 TPU, the global batch size would
	be 64 * 8 = 512, and for a v3-32, the global batch size is 64 * 32 = 2048.

	### ResNet50

	#### On GPU:

	<details>
	```bash
	python3 classifier_trainer.py \
	--mode=train_and_eval \
	--model_type=resnet \
	--dataset=imagenet \
	--model_dir=$MODEL_DIR \
	--data_dir=$DATA_DIR \
	--config_file=configs/examples/resnet/imagenet/gpu.yaml \
	--params_override='runtime.num_gpus=$NUM_GPUS'
	```
	</details>

	To train on multiple hosts, each with GPUs attached using
	[MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy)
	please update `runtime` section in gpu.yaml
	(or override using `--params_override`) with:

	<details>

	```YAML
	# gpu.yaml
	runtime:
	distribution_strategy: 'multi_worker_mirrored'
	worker_hosts: '$HOST1:port,$HOST2:port'
	num_gpus: $NUM_GPUS
	task_index: 0
	```
	</details>

	By having `task_index: 0` on the first host and `task_index: 1` on the second
	and so on. `$HOST1` and `$HOST2` are the IP addresses of the hosts, and `port`
	can be chosen any free port on the hosts. Only the first host will write
	TensorBoard Summaries and save checkpoints.

	#### On TPU:

	<details>
	```bash
	python3 classifier_trainer.py \
	--mode=train_and_eval \
	--model_type=resnet \
	--dataset=imagenet \
	--tpu=$TPU_NAME \
	--model_dir=$MODEL_DIR \
	--data_dir=$DATA_DIR \
	--config_file=configs/examples/resnet/imagenet/tpu.yaml
	```

	</details>

	### VGG-16

	#### On GPU:

	<details>
	```bash
	python3 classifier_trainer.py \
	--mode=train_and_eval \
	--model_type=vgg \
	--dataset=imagenet \
	--model_dir=$MODEL_DIR \
	--data_dir=$DATA_DIR \
	--config_file=configs/examples/vgg/imagenet/gpu.yaml \
	--params_override='runtime.num_gpus=$NUM_GPUS'
	```

	</details>

	### EfficientNet
	Note: EfficientNet development is a work in progress.
	#### On GPU:

	<details>
	```bash
	python3 classifier_trainer.py \
	--mode=train_and_eval \
	--model_type=efficientnet \
	--dataset=imagenet \
	--model_dir=$MODEL_DIR \
	--data_dir=$DATA_DIR \
	--config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml \
	--params_override='runtime.num_gpus=$NUM_GPUS'
	```

	</details>

	#### On TPU:

	<details>
	```bash
	python3 classifier_trainer.py \
	--mode=train_and_eval \
	--model_type=efficientnet \
	--dataset=imagenet \
	--tpu=$TPU_NAME \
	--model_dir=$MODEL_DIR \
	--data_dir=$DATA_DIR \
	--config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml
	```
	</details>

	Note that the number of GPU devices can be overridden in the command line using
	`--params_overrides`. The TPU does not need this override as the device is fixed
	by providing the TPU address or name with the `--tpu` flag.