# Performance Comparison: Image Classification Transfer Learning with TensorFlow and the Intel® Transfer Learning Tool

This notebook uses the TensorFlow libraries to do transfer learning with an image classification model. The model is exported, evaluated, and used to generate predictions. The same sequence is also done using the Intel Transfer Learning Tool. The Intel Transfer Learning Tool is also used to optimize and quantized the trained model.

Graphs are generated to visually compare:
* Training metrics (time per epoch, accuracy by epoch, loss by epoch)
* Evaluation metrics (time to evaluate the validation dataset, accuracy using the validation data)
* Prediction time for a single batch
* Latency and throughput for the trained models, quantized model, and the optimized model.

The notebook has variables to allow controlling parameters such as the model name, dataset, the number of training epochs, and the batch size(s).

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 

import math
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import numpy as np
import pandas as pd
import psutil
import random
import tempfile
import tensorflow as tf
import tensorflow_hub as hub
import warnings

from tlt.datasets import dataset_factory
from tlt.models import model_factory
from tlt.utils.file_utils import download_and_extract_tar_file
from tlt.utils.platform_util import CPUInfo, OptimizedPlatformUtil, PlatformUtil
from utils import inc_utils

# Ignore all warnings
warnings.filterwarnings('ignore')
tf.get_logger().setLevel('ERROR')

# Specify the the default dataset directory
dataset_directory = os.environ["DATASET_DIR"] if "DATASET_DIR" in os.environ else \
    os.path.join(os.environ["HOME"], "dataset")

# Specify a directory for output (saved models and checkpoints)
output_directory = os.environ["OUTPUT_DIR"] if "OUTPUT_DIR" in os.environ else \
    os.path.join(os.environ["HOME"], "output")

print("Output directory:", output_directory)

# TF Hub cache directory
os.environ["TFHUB_CACHE_DIR"] = os.path.join(output_directory, ".cache", "tfhub_modules")

# Data Frame styles
table_styles =[{
    'selector': 'caption',
    'props': [
        ('text-align', 'center'),
        ('color', 'black'),
        ('font-size', '16px')
    ]
}]

# Colors used in charts
orange = '#ff6f00'
blue = '#0071c5'
dark_blue = '#003c71'
yellow = '#f3d54e'

# Caption style for DataFrames
caption_style = [dict(selector="caption", props=[("text-align", "center"), ("font-size", "14pt"), ("color", "black")])]

# Line styles
line_styles = ['solid', 'dotted', 'dashed', 'dashdot']

# Marker styles
marker_styles = ['o', 'D', 's', 'v']

## 1. Display Platform Information

Use the `CPUInfo` and `PlatformUtil` classes in the get and display information about the platform and TensorFlow version.

In [None]:
# Get and display CPU/platform information
cpu_info = CPUInfo()
platform_util = PlatformUtil()
print("{0} CPU Information {0}".format("=" * 20))
print("CPU family:", platform_util.cpu_family)
print("CPU model:", platform_util.cpu_model)
print("CPU type:", platform_util.cpu_type)
print("Physical cores per socket:", cpu_info.cores_per_socket)
print("Total physical cores:", cpu_info.cores)
cpufreq = psutil.cpu_freq()
print("Max Frequency:", cpufreq.max)
print("Min Frequency:", cpufreq.min)
cpu_socket_count = cpu_info.sockets
print("Socket Number:", cpu_socket_count)

print("\n{0} Memory Information {0}".format("=" * 20))
svmem = psutil.virtual_memory()
print("Total: ", int(svmem.total / (1024 ** 3)), "GB")

# Display TensorFlow version information
print("\n{0} TensorFlow Information {0}".format("=" * 20))
print("TensorFlow version:", tf.__version__)
print("TensorFlow Hub version:", hub.__version__)
major_version = int(tf.__version__.split(".")[0])
minor_version = int(tf.__version__.split(".")[1])
if major_version >= 2:
    onednn_enabled = 0
    if minor_version < 5:
        from tensorflow.python import _pywrap_util_port
    else:
        from tensorflow.python.util import _pywrap_util_port
        onednn_enabled = int(os.environ.get('TF_ENABLE_ONEDNN_OPTS', '0'))
    on_onednn = _pywrap_util_port.IsMklEnabled() or (onednn_enabled == 1)
else:
    on_onednn = tf.pywrap_tensorflow.IsMklEnabled()

print("oneDNN enabled:", on_onednn)

# Don't use the NVidia GPU, if there is one
os.environ['CUDA_VISIBLE_DEVICES'] = ""

## 2. Select a model and define parameters to use during training and evaluation

### Select a model

See the list of supported image classification models from TensorFlow Hub in the Intel Transfer Learning Tool.

In [None]:
framework = 'tensorflow'
use_case = 'image_classification'
model_hub = 'TFHub'
supported_models = model_factory.get_supported_models(framework, use_case)
supported_models = supported_models[use_case]

# Filter to only get relevant models
supported_models = { key:value for (key,value) in supported_models.items() if value[framework]['model_hub'] == model_hub}

print("Supported {} models for {} from {}".format(framework, use_case, model_hub))
print("=" * 70)
for model_name in supported_models.keys():
    print(model_name)

Set the `model_name` to the model that will be used for transfer learning.

In [None]:
# Select a model
model_name = "resnet_v1_50"

# Get information about the model (image size and the feature vector handle)
# This information will be used during transfer learning using the TensorFlow framework API
if model_name in supported_models.keys():
    model_info = supported_models[model_name][framework]
    image_size = model_info["image_size"]
    feature_vector_handle = model_info['feature_vector']
    
    print("Model Name: {}".format(model_name))
    print("TF Hub feature vector: {}".format(feature_vector_handle))
    print("Image size: {}".format(image_size))
else:
    raise ValueError("The specified model is unsupported. Please select a model from the list of supported models.")

### Select a dataset

By default, the notebook will use the [TensorFlow Flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers), which has flower images that belong to 5 categories.

To use your own dataset, set the `dataset_subdir` variable to the dataset path. The dataset directory is expected to have folders of images for each class, where the name of the folder will be used as the class name.

```
dataset_dir
          ├── class_a
          ├── class_b
          └── class_c
```

Optionally, the `dataset_subdir` directory can have `train` and `test`/`validation` subdirectories. For example:
```
dataset_dir
          ├── train
          |   ├── class_a
          |   ├── class_b
          |   └── class_c
          └── test
              ├── class_a
              ├── class_b
              └── class_c
```
If the dataset does not have separate folders for train and test/validation, the dataset will be split by percentage.

In [None]:
dataset_subdir = os.path.join(dataset_directory, "flower_photos")

# Download the flowers dataset, if the folder doesn't exist
if not os.path.exists(dataset_subdir):
    os.makedirs(dataset_subdir)
    dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
    download_and_extract_tar_file(dataset_url, dataset_directory)
    
print("Dataset path:", dataset_subdir)

print("\nFolders in the dataset directory:")
for d in os.listdir(dataset_subdir):
    if os.path.isdir(os.path.join(dataset_subdir, d)):
        print("-", d)

### Define parameters

For consistency between the model training using the TensorFlow framework API and the model training using the Intel Transfer Learning Tool API, the next cell defines parameters that will be used by both methods.

In [None]:
# Number of training epochs
training_epochs = 2

# Shuffle the files after each training epoch
shuffle_files = True

# Define training/validation splits for the dataset 
# (if the dataset directory does not have subdirectories for train and test/validation)
validation_split = 0.25
training_split = 1 - validation_split

# Set seed for consistency between runs (or None)
seed = 10

# List of batch size(s) to compare (maximum of 4 batch sizes to try)
batch_size_list = [ 256, 512 ]

Validate parameter values and then print out the parameter values.

In [None]:
if not isinstance(training_epochs, int):
    raise TypeError("The training_epochs parameter should be an integer, but found a {}".format(type(training_epochs)))

if training_epochs < 1:
    raise ValueError("The training_epochs parameter should not be less than 1.")
    
if not isinstance(shuffle_files, bool):
    raise TypeError("The shuffle_files parameter should be a bool, but found a {}".format(type(shuffle_files)))

if not isinstance(validation_split, float):
    raise TypeError("The validation_split parameter should be a float, but found a {}".format(type(validation_split)))

if not isinstance(training_split, float):
    raise TypeError("The training_split parameter should be a float, but found a {}".format(type(training_split)))

if validation_split + training_split > 1:
    raise ValueError("The sum of validation_split and training_split should not be greater than 1.")

if seed and not isinstance(seed, int):
    raise TypeError("The seed parameter should be a integer or None, but found a {}".format(type(seed)))

if len(batch_size_list) > 4 or len(batch_size_list) == 0:
    raise ValueError("The batch_size_list should have at most 4 values, but found {} values ({})".format(
        len(batch_size_list), batch_size_list))
    
print("Number of training epochs:", training_epochs)
print("Shuffle files:", shuffle_files)
print("Training split: {}%".format(training_split*100))
print("Validation split: {}%".format(validation_split*100))
print("Seed:", str(seed))
print("Batch size list:", batch_size_list)

Define a callback method that track the time that it took to run training epochs, evaluation, and batch predictions.

In [None]:
# Callback to track the training time for each epoch, evaluation time, or prediction time
class TimerCallback(tf.keras.callbacks.Callback):
    def __init__(self):
        self.epoch_times = []
        self.eval_times = []
        self.predict_times = []
    def on_epoch_begin(self, batch, logs={}):
        self.tf_timestamp = tf.timestamp()
    def on_epoch_end(self,epoch,logs = {}):
        self.epoch_times.append((tf.timestamp() - self.tf_timestamp).numpy())
    def on_test_begin(self, batch, logs={}):
        self.tf_timestamp = tf.timestamp()
    def on_test_end(self,epoch,logs = {}):
        self.eval_times.append((tf.timestamp() - self.tf_timestamp).numpy())
    def on_predict_begin(self, batch, logs={}):
        self.tf_timestamp = tf.timestamp()
    def on_predict_end(self,epoch,logs = {}):
        self.predict_times.append((tf.timestamp() - self.tf_timestamp).numpy())

## 3. Compare the training time for transfer learning

In this section, we will compare the time it takes to retrain the image classification model using the dataset that was selected in the previous section.

The training will be done in two different ways to compare:
* Transfer learning using the TensorFlow framework and TF Hub libraries
* Transfer learning using the Intel Transfer Learning Tool API

### Transfer learning using the TensorFlow framework and TF Hub libraries

This section goes through using the TensorFlow framework and TF Hub libraries to retrain the model using the selected dataset.

First, the dataset is loaded in, which allows us to determine the number of classes in the dataset. The original ImageNet dataset that the image classification model was trained on has 1000 classes. To do transfer learning using the new dataset, we will get the feature vector from TF Hub and then add on a classification layer that matches the number of classes in the new dataset.

If multiple batch sizes were set in the `batch_size_list`, training will be run for each batch size.

In [None]:
# Set seed
if seed:
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)

# Lists to track callbacks, datasets, models, and saved model directory for each batch size experiment
tf_time_callback_list = []
tf_dataset_list = []
tf_model_list = []
tf_export_dir_list = []
tf_history_list = []

# Check if the dataset directory has subdirectories for train/validation/test splits
val_dataset_dir = None
train_dataset_dir = dataset_subdir
if os.path.exists(os.path.join(dataset_subdir, 'train')):
    train_dataset_dir = os.path.join(dataset_subdir, 'train')
    val_dataset_dir = os.path.join(dataset_subdir, 'validation')
    
    if not os.path.exists(val_dataset_dir):
        if os.path.exists(os.path.join(dataset_subdir, 'test')):
            val_dataset_dir = os.path.join(dataset_subdir, 'test')
        else:
            raise ValueError('The dataset directory ({}) has a "train" directory, but no "validation" or "test" directory.')

    print("Using training data from {}".format(train_dataset_dir))
    print("Using validation data from {}".format(val_dataset_dir))
    
# Load the dataset
tf_dataset = tf.keras.utils.image_dataset_from_directory(train_dataset_dir, batch_size=None, seed=seed)
class_names = tf_dataset.class_names

if shuffle_files:
    tf_dataset = tf_dataset.shuffle(tf_dataset.cardinality(), reshuffle_each_iteration=False, seed=seed)

if val_dataset_dir:
    # Load the validation/test sub directory
    train_ds = tf_dataset
    val_ds = tf.keras.utils.image_dataset_from_directory(val_dataset_dir, batch_size=None, seed=seed) 
    if shuffle_files:
        val_ds = val_ds.shuffle(val_ds.cardinality(), reshuffle_each_iteration=False, seed=seed)
    train_size = len(train_ds)
    val_size = len(val_ds)
else:
    # Split the data into train/validation subsets (Note that image_dataset_from_directory can also do splitting but
    # we are doing it this way to match what the Intel Transfer Learning Tool does to ensure the same sized splits)
    train_size = int(training_split * len(tf_dataset))
    val_size = int(validation_split * len(tf_dataset))
    train_ds = tf_dataset.take(train_size)
    val_ds = tf_dataset.skip(train_size).take(val_size)

print("Training dataset size:", train_size)
print("Validation dataset size:", val_size)
    
# Preprocess the dataset
normalization_layer = tf.keras.layers.Rescaling(1. / 255)

def preprocess_image(image, label):
    image = tf.image.resize_with_pad(image, image_size, image_size)
    image = normalization_layer(image)
    return (image, label)

train_ds = train_ds.map(preprocess_image)
val_ds = val_ds.map(preprocess_image)

for batch_size in batch_size_list:
    print('-' * 40)
    print('Training using batch size: {}'.format(batch_size))
    print('-' * 40)
    
    # Batch the dataset 
    batched_train_ds = train_ds.batch(batch_size)
    batched_val_ds = val_ds.batch(batch_size)
    
    # Get the feature extractor layer from TF Hub
    feature_extractor_layer = hub.KerasLayer(feature_vector_handle,
                                             input_shape=(image_size, image_size, 3),
                                             trainable=False)

    # Add the dense layer sized according to the number of classes in our dataset
    tf_model = tf.keras.Sequential([
      feature_extractor_layer,
      tf.keras.layers.Dense(len(class_names))
    ])

    # Configure the model optimizer and loss function
    tf_model.compile(
      optimizer=tf.keras.optimizers.Adam(),
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=['acc'])
    
    tf_model.summary()

    # Define the callback for tracking the time it takes to train each epoch
    tf_time_callback = TimerCallback()

    # Train the model
    tf_history_list.append(tf_model.fit(batched_train_ds, epochs=training_epochs, shuffle=shuffle_files,
                                        callbacks=[tf_time_callback]))
    
    # Export the trained model
    tf_export_dir = os.path.join(output_directory, "tf_saved_models", model_name)
    if not os.path.exists(tf_export_dir):
        os.makedirs(tf_export_dir)
    tf_export_dir = tempfile.mkdtemp(prefix=tf_export_dir + '/')
    print("Save model to:", tf_export_dir)
    tf_model.save(tf_export_dir)
    
    # Append to lists for each batch size
    tf_time_callback_list.append(tf_time_callback)
    tf_dataset_list.append((batched_train_ds, batched_val_ds))
    tf_model_list.append(tf_model)
    tf_export_dir_list.append(tf_export_dir)

### Transfer learning using the Intel Transfer Learning Tool API

This section uses the Intel Transfer Learning Tool API to retrain the model using the selected dataset. This API simplifies the transfer learning process, so there are less lines of code compared to directly using the TensorFlow and TensorFlow Hub libraries.

In [None]:
# Use the OptimizedPlatform Util class from the Intel Transfer Learning Tool API to set recommended settings
optimized_platform_util = OptimizedPlatformUtil(omp_num_threads=cpu_info.cores_per_socket,
                                               kmp_blocktime=0,
                                               kmp_affinity='granularity=fine,compact,1,0',
                                               tf_num_intraop_threads=cpu_info.cores_per_socket,
                                               tf_num_interop_threads=cpu_info.sockets,
                                               force_reset_env_vars=True)

for k, v in optimized_platform_util.env_vars_dict.items():
    if v is not None:
        print("{}: {}".format(k, v))

In [None]:
# Lists to track callbacks, datasets, models, and saved model directory for each batch size experiment
tlt_time_callback_list = []
tlt_dataset_list = []
tlt_model_list = []
tlt_export_dir_list = []
tlt_history_list = []
    
for batch_size in batch_size_list:
    print('-' * 40)
    print('Training using batch size: {}'.format(batch_size))
    print('-' * 40)
    
    # Use the model factory to get the model
    tlt_model = model_factory.get_model(model_name=model_name, framework=framework)
    
    # Load, split, and preprocess the dataset
    tlt_dataset = dataset_factory.load_dataset(dataset_dir=dataset_subdir, use_case=use_case, framework=framework)
    
    if not tlt_dataset.train_subset:
        tlt_dataset.shuffle_split(train_pct=training_split, val_pct=validation_split, seed=seed, shuffle_files=shuffle_files)
    
    tlt_dataset.preprocess(tlt_model.image_size, batch_size=batch_size)
    
    # Define the callback for tracking the time it takes to train each epoch
    tlt_time_callback = TimerCallback()

    # Train the model
    tlt_history_list.append(tlt_model.train(tlt_dataset, output_dir=output_directory, epochs=training_epochs,
                                            shuffle_files=shuffle_files, do_eval=False, callbacks=tlt_time_callback,
                                            seed=seed))

    # Export the trained model
    tlt_export_dir = os.path.join(output_directory, "tlt_saved_models")
    tlt_export_dir = tlt_model.export(tlt_export_dir)
    
    # Append to lists for each batch size
    tlt_time_callback_list.append(tlt_time_callback)
    tlt_dataset_list.append(tlt_dataset)
    tlt_model_list.append(tlt_model)
    tlt_export_dir_list.append(tlt_export_dir)

### Optimize the model using the Intel Transfer Learning Tool API

After training, the Intel Transfer Learning Tool can optimize the model to improve inference performance. This is done using the [Intel® Neural Compressor](https://github.com/intel/neural-compressor) quantizing the model or optimizing the full precision model. 

First, we setup a configuration file that with parameters that will be used by the Intel Neural Compressor for quantization.

In [None]:
tlt_quantization_dir_list = []
tlt_optimized_dir_list = []
inc_config_list = []

# Create a tuning workspace directory for INC
nc_workspace = os.path.join(output_directory, 'nc_workspace')

# Relative accuracy loss (1%)
relative_accuracy_criterion = 0.01

# Define the exit policy timeout (in seconds) and max number of trials. The tuning processing finishes when
# the timeout or max trials is reached. A tuning timeout of 0 means that the tuning phase stops when the
# accuracy criterion is met.
timeout = 0
max_trials=15

for i, batch_size in enumerate(batch_size_list):
    # Create an output directories for the quantized and optimized models
    tlt_quantization_dir = os.path.join(output_directory, 'tlt_quantized_models', model_name, os.path.basename(tlt_export_dir_list[i]))
    tlt_optimized_dir = os.path.join(output_directory, 'tlt_optimized_models', model_name, os.path.basename(tlt_export_dir_list[i]))

    # Create an Intel Neural Compressor config based on the inputs that we are using
    inc_config_list.append(tlt_model.get_inc_config(accuracy_criterion_relative=relative_accuracy_criterion,
                                                    exit_policy_timeout=timeout, exit_policy_max_trials=max_trials))
    
    # Append to lists for each batch size
    tlt_quantization_dir_list.append(tlt_quantization_dir)
    tlt_optimized_dir_list.append(tlt_optimized_dir)

Next, we quantize the model using the config file that was just generated. Quantization aims to improve inference
performance by reducing the number of bits required, by maintaining close the the same amount of accuracy as the full precision model. 

In [None]:
for i, batch_size in enumerate(batch_size_list):
    # Quantize the model
    tlt_model.quantize(tlt_quantization_dir_list[i], tlt_dataset_list[i], config=inc_config_list[i])

Another option to improve inference performance is using graph optimization through the Intel Neural Compressor which:
* Converts variables to constants
* Removes training-only operations like checkpoint saving
* Strips out parts of the graph that are never reached
* Removes debug operations like CheckNumerics
* Folds batch normalization ops into the pre-calculated weights
* Fuses common operations into unified versions

In [None]:
for i, batch_size in enumerate(batch_size_list):
    # Optimize the full precision model
    tlt_model.optimize_graph(tlt_optimized_dir_list[i])

### Compare training times

The table below compares the time it took to train each epoch (in seconds) using the TensorFlow framework libraries directly versus the Intel Transfer Learning Tool API.

In [None]:
display_df = []
plt.figure(figsize=(10,6))

for i, batch_size in enumerate(batch_size_list):
    # Sanity check that both datasets had the same number of batches
    if len(tf_dataset_list[i][0]) != len(tlt_dataset_list[i].train_subset):
        print("WARNING: For batch size {}, the TF training dataset had {} batches and the TLT training dataset had "
              "{} batches. These values should have been the same.".format(batch_size, len(tf_dataset_list[i][0]), len(tlt_dataset_list[i].train_subset)))
    
    # Calculate images/second
    tf_images_per_second = [train_size / t for t in tf_time_callback_list[i].epoch_times]
    tlt_images_per_second = [train_size / t for t in tlt_time_callback_list[i].epoch_times]
    performance_delta = ["{0:.2f}%".format((tlt-tf)/tf * 100) for tf, tlt in zip(tf_images_per_second, tlt_images_per_second)]

    # Graph the results
    epoch_list = [str(i) for i in range(1, training_epochs + 1)]
    tf_train_time = tf_time_callback_list[i].epoch_times
    tlt_train_time = tlt_time_callback_list[i].epoch_times

    plt.plot(epoch_list, tf_train_time, label="Using TF libraries with batch size {}".format(batch_size),
             linestyle=line_styles[i], marker=marker_styles[i], color=orange)
    plt.plot(epoch_list, tlt_train_time, label="Using TLT with batch size {}".format(batch_size), 
             linestyle=line_styles[i], marker=marker_styles[i],color=blue)
    
    # Create a DataFrame to display the results in a table
    df = pd.DataFrame({
        'TF epoch time<br>(seconds)': tf_time_callback_list[i].epoch_times,
        'TLT epoch time<br>(seconds)': tlt_time_callback_list[i].epoch_times,
        'TF throughput<br>(images/sec)': tf_images_per_second,
        'TLT throughput<br>(images/sec)': tlt_images_per_second,
        'Performance<br>Boost': performance_delta
    })
    df.index += 1 
    df = df.style.set_table_styles(table_styles).set_caption("Epoch training times with batch size {}".format(batch_size))
    display_df.append(df)

plt.title("Training time per epoch")
plt.xlabel("Epoch")
plt.ylabel("Seconds")
plt.legend()
plt.show()

# Display tables with epoch training time for each batch size
for df in display_df:
    display(df)

Next, visualize the accuracy and loss metrics collected during training.

In [None]:
# Graph the training accuracy by epoch for each batch size
plt.figure(figsize=(10,6))
for i, batch_size in enumerate(batch_size_list):
    tf_acc_time = [i * 100 for i in tf_history_list[i].history['acc']]
    tlt_acc_time = [i * 100 for i in tlt_history_list[i]['acc']]

    plt.plot(epoch_list, tf_acc_time, label = "Using TF libraries (batch size = {})".format(batch_size), linestyle=line_styles[i], marker=marker_styles[i], color=orange)
    plt.plot(epoch_list, tlt_acc_time, label = "Using TLT (batch size = {})".format(batch_size), linestyle=line_styles[i], marker=marker_styles[i], color=blue)

plt.title("Training Accuracy by Epoch")
plt.xlabel("Epoch")
plt.ylabel("Accuracy (%)")
plt.legend()
plt.show()

# Graph the training loss by epoch for each batch size
plt.figure(figsize=(10,6))
for i, batch_size in enumerate(batch_size_list):
    tf_loss_time = tf_history_list[i].history['loss']
    tlt_loss_time = tlt_history_list[i]['loss']

    plt.plot(epoch_list, tf_loss_time, label = "Using TF libraries (batch size = {})".format(batch_size), linestyle=line_styles[i], marker=marker_styles[i], color=orange)
    plt.plot(epoch_list, tlt_loss_time, label = "Using TLT (batch size = {})".format(batch_size), linestyle=line_styles[i], marker=marker_styles[i], color=blue)

plt.title("Training Loss by Epoch")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.show()

## 4. Evaluate and predict

This section calls evaluation and prediction methods for the models trained using the TensorFlow libraries and the Intel Transfer Learning Tool.

### Evaluate the models using the validation data

First, evaluate the models trained using the TensorFlow libraries.

In [None]:
tf_eval_callback_list = []
tf_eval_metrics_list = []

# Evaluate using the TensorFlow framework model for each batch size
for i, batch_size in enumerate(batch_size_list):
    print('-' * 40)
    print('Evaluate using batch size: {}'.format(batch_size))
    print('-' * 40)
    
    tf_eval_callback = TimerCallback()
    
    # Use the test split of the dataset to evaluate the model
    tf_eval_metrics_list.append(tf_model_list[i].evaluate(tf_dataset_list[i][1], callbacks=tf_eval_callback))
    tf_eval_callback_list.append(tf_eval_callback)

Next, evaluate the models trained using the Intel Transfer Learning Tool.

In [None]:
tlt_eval_callback_list = []
tlt_eval_metrics_list = []

# Evaluate using the Intel Transfer Learning Tool model for each batch size
for i, batch_size in enumerate(batch_size_list):
    print('-' * 40)
    print('Evaluate using batch size: {}'.format(batch_size))
    print('-' * 40)
    
    use_test_set = tlt_dataset_list[i].validation_subset is None and tlt_dataset_list[i].test_subset is not None
    
    tlt_eval_callback = TimerCallback()
    tlt_eval_metrics_list.append(tlt_model_list[i].evaluate(tlt_dataset_list[i], callbacks=tlt_eval_callback, use_test_set=use_test_set))
    tlt_eval_callback_list.append(tlt_eval_callback)

After all the models have been evaluated, visualize the results using charts that the display the time that it took to evaluate each model and the accuracy that was found when using the validation dataset.

In [None]:
# Bar chart group labels
groups = ["batch size = {}".format(bs) for bs in batch_size_list]

# Create grouped bar chart for evaluation time
decimals = 3 # number of decimals to use for rounding
tf_eval_times = [round(callback.eval_times[0], decimals) for callback in tf_eval_callback_list]
tlt_eval_times = [round(callback.eval_times[0], decimals) for callback in tlt_eval_callback_list]

x = np.arange(len(groups))
width = 0.24  # the width of the bars
multiplier = 0

# Setup bars for evaluation times
fig, (ax1, ax2) = plt.subplots(2)
fig.set_figheight(10)
fig.set_figwidth(10)
rects_tf = ax1.bar(x - width/2, tf_eval_times, width, label='TF eval', color=orange)
rects_tlt = ax1.bar(x + width/2, tlt_eval_times, width, label='TLT eval', color=blue)
ax1.bar_label(rects_tf, padding=3)
ax1.bar_label(rects_tlt, padding=3)

# Add labels, title, and legend
ax1.set_ylabel('Seconds')
ax1.set_title('Evaluation time')
ax1.set_xticks(x, groups)
ax1.set_ymargin(0.2) 
ax1.legend(ncols=2)
#plt.show()

# Evaluation accuracy comparison
decimals = 2
tf_acc_index = tf_model_list[0].metrics_names.index('acc')
tlt_acc_index = tlt_model_list[0]._model.metrics_names.index('acc')
tf_eval_accuracy = [round(x[tf_acc_index] * 100, decimals) for x in tf_eval_metrics_list]
tlt_eval_accuracy = [round(x[tlt_acc_index] * 100, decimals) for x in tlt_eval_metrics_list]

# Setup bars for evaluation accuracy
rects_tf = ax2.bar(x - width/2, tf_eval_accuracy, width, label='TF accuracy', color=orange)
rects_tlt = ax2.bar(x + width/2, tlt_eval_accuracy, width, label='TLT accuracy', color=blue)
ax2.bar_label(rects_tf, padding=3)
ax2.bar_label(rects_tlt, padding=3)

# Add labels, title, and legend
ax2.set_ylabel('Accuracy (%)')
ax2.yaxis.set_major_formatter(mtick.PercentFormatter())
ax2.set_title('Evaluation accuracy using the validation data')
ax2.set_xticks(x, groups)
ax2.set_ymargin(0.2) 
ax2.legend(ncols=2)
plt.show()

### Predict using a batch of images

Use the TensorFlow libaries to get a batch of images and predict using the trained models.

In [None]:
tf_predict_callback_list = []

for i, batch_size in enumerate(batch_size_list):
    print('-' * 50)
    print('Predict on a single batch (batch size = {})'.format(batch_size))
    print('-' * 50)
    
    tf_predict_time = TimerCallback()
    dataset_batch = next(iter(tf_dataset_list[i][0]))
    tf_batch, _ = dataset_batch
    batch_predictions = tf_model_list[i].predict(tf_batch, callbacks=tf_predict_time)
    tf_predict_callback_list.append(tf_predict_time)

Similarly, use the Intel Transfer Learning Tool API to get a batch of images and predict using the trained models.

In [None]:
tlt_predict_callback_list = []

for i, batch_size in enumerate(batch_size_list):
    print('-' * 50)
    print('Predict on a single batch (batch size = {})'.format(batch_size))
    print('-' * 50)
    
    tlt_predict_time = TimerCallback()

    tlt_batch, _ = tlt_dataset_list[i].get_batch(subset='train')
    predictions = tlt_model_list[i].predict(tlt_batch, callbacks=tlt_predict_time)
    tlt_predict_callback_list.append(tlt_predict_time)

Visualize the time that it took to get predictions for a batch of images for each model.

In [None]:
# Create grouped bar chart for prediction time
decimals = 3   # number of decimals to use for rounding
tf_predict_times = [round(callback.predict_times[0], decimals) for callback in tf_predict_callback_list]
tlt_predict_times = [round(callback.predict_times[0], decimals) for callback in tlt_predict_callback_list]

# Setup bars for evaluation times
fig, ax = plt.subplots()
fig.set_figheight(6)
fig.set_figwidth(10)
rects_tf = ax.bar(x - width/2, tf_predict_times, width, label='TF predict', color=orange)
rects_tlt = ax.bar(x + width/2, tlt_predict_times, width, label='TLT predict', color=blue)
ax.bar_label(rects_tf, padding=3)
ax.bar_label(rects_tlt, padding=3)

# Add labels, title, and legend
ax.set_ylabel('Seconds')
ax.set_title('Prediction time for a single batch')
ax.set_xticks(x, groups)
ax.set_ymargin(0.2) 
ax.legend(ncols=2)
plt.show()

### Check performance using the Intel® Neural Compressor

Use the [Intel Neural Compressor](https://github.com/intel/neural-compressor/tree/master) to determine the performance of the exported models. 

We will compare:
* The original model that was trained using the TensorFlow and TF Hub libaries
* The model trained using the Intel Transfer Learning Tool
* The model trained and quantized using the Intel Transfer Learning Tool
* The model trained and optimized using the Intel Transfer Learning Tool

In [None]:
test_dataset_dir = dataset_subdir

if os.path.exists(os.path.join(dataset_subdir, 'validation')):
    test_dataset_dir = os.path.join(dataset_subdir, 'validation')
elif os.path.exists(os.path.join(dataset_subdir, 'test')):
    test_dataset_dir = os.path.join(dataset_subdir, 'test')
    
print("Test dataset directory:", test_dataset_dir)

Use the Intel Neural Compressor to get the performance of the model trained using the TensorFlow libraries.

Note that you may see a `zmq.error.ZMQError: Address already in use` error in the output, which is a known issuen when running the Intel Neural Compressor from Jupyter notebooks. If this happens, rerun the cell.

In [None]:
tf_latency_list = []
tf_throughput_list = []

for i, batch_size in enumerate(batch_size_list):
    print('-' * 90)
    print('Check performance for TF model (batch size = {})'.format(batch_size))
    print('Saved model directory: {}'.format(tf_export_dir_list[i]))
    print('-' * 90)
    
    results = inc_utils.performance(tf_export_dir_list[i], batch_size, image_size, test_dataset_dir, framework)
    tf_latency, tf_throughput = inc_utils.calculate_latency_and_throughput(results)
    
    tf_latency_list.append(tf_latency)
    tf_throughput_list.append(tf_throughput)

Next, get the performance of the the model that was trained and exported by the Intel Transfer Learning Toolkit.

In [None]:
tlt_latency_list = []
tlt_throughput_list = []

for i, batch_size in enumerate(batch_size_list):
    print('-' * 90)
    print('Check performance for TLT model (batch size = {})'.format(batch_size))
    print('Saved model directory: {}'.format(tlt_export_dir_list[i]))
    print('-' * 90)
    
    tlt_results = inc_utils.performance(tlt_export_dir_list[i], batch_size, image_size, test_dataset_dir, framework)
    tlt_latency, tlt_throughput = inc_utils.calculate_latency_and_throughput(tlt_results)
    
    tlt_latency_list.append(tlt_latency)
    tlt_throughput_list.append(tlt_throughput)

Get the performance of the model that was quantized using the Intel Transfer Learning tool with the Intel Neural Compressor.

In [None]:
quantized_latency_list = []
quantized_throughput_list = []

for i, batch_size in enumerate(batch_size_list):
    try:
        tlt_quantized_latency = 0
        tlt_quantized_throughput = 0
        
        print('-' * 90)
        print('Check performance for TLT quantized model (batch size = {})'.format(batch_size))
        print('Saved model directory: {}'.format(tlt_quantization_dir_list[i]))
        print('-' * 90)
        
        if not os.path.exists(os.path.join(tlt_quantization_dir_list[i], 'saved_model.pb')):
            raise FileNotFoundError("The quantized model was not found at: {}\nQuantization may have failed for this model/batch size.".format(tlt_quantization_dir_list[i],))
    
        tlt_quantized_results = inc_utils.performance(tlt_quantization_dir_list[i], batch_size, image_size, test_dataset_dir, framework)
        tlt_quantized_latency, tlt_quantized_throughput = inc_utils.calculate_latency_and_throughput(tlt_quantized_results)
    except Exception as e:
        print("Error when trying to check the performance for the quantized model with batch size {}".format(batch_size))
        print(e)
    finally:
        quantized_latency_list.append(tlt_quantized_latency)
        quantized_throughput_list.append(tlt_quantized_throughput)

Finally, get the performance of the model that was optimized using the Intel Transfer Learning tool with the Intel Neural Compressor.

In [None]:
optimized_latency_list = []
optimized_throughput_list = []

for i, batch_size in enumerate(batch_size_list):
    try:
        tlt_optimized_latency = 0
        tlt_optimized_throughput = 0
        
        print('-' * 90)
        print('Check performance for TLT optimized model (batch size = {})'.format(batch_size))
        print('Saved model directory: {}'.format(tlt_optimized_dir_list[i]))
        print('-' * 90)
        
        if not os.path.exists(os.path.join(tlt_optimized_dir_list[i], 'saved_model.pb')):
            raise FileNotFoundError("The optimized model was not found at: {}\nOptimization may have failed for this model/batch size.".format(tlt_optimized_dir_list[i],))

        tlt_optimized_results = inc_utils.performance(tlt_optimized_dir_list[i], batch_size, image_size, test_dataset_dir, framework)
        
        tlt_optimized_latency, tlt_optimized_throughput = inc_utils.calculate_latency_and_throughput(tlt_optimized_results)
    except Exception as e:
        print("Error when trying to check the performance for the optimized model with batch size {}".format(batch_size))
        print(e)
    finally:
        optimized_latency_list.append(tlt_optimized_latency)
        optimized_throughput_list.append(tlt_optimized_throughput)

Visualize the latency and throughput results for all of the models.

In [None]:
width = 0.18   # the width of the bars

# Round the latency values
decimals = 2   # number of decimals to use for rounding
tf_latency_list = [0 if math.isnan(x) else round(x, decimals) for x in tf_latency_list]
tlt_latency_list = [0 if math.isnan(x) else round(x, decimals) for x in tlt_latency_list]
quantized_latency_list = [0 if math.isnan(x) else round(x, decimals) for x in quantized_latency_list]
optimized_latency_list = [0 if math.isnan(x) else round(x, decimals) for x in optimized_latency_list]

# Setup the grouped bar chart for latency
fig, ax = plt.subplots()
fig.set_figheight(6)
fig.set_figwidth(10)
rects_tf = ax.bar(x, tf_latency_list, width, label='TF latency', color=orange)
rects_tlt = ax.bar(x + width, tlt_latency_list, width, label='TLT latency', color=blue)
rects_quant = ax.bar(x + width * 2, quantized_latency_list, width, label='TLT quantized latency', color=yellow)
rects_opt = ax.bar(x + width * 3, optimized_latency_list, width, label='TLT optimized latency', color=dark_blue)
ax.bar_label(rects_tf, padding=3)
ax.bar_label(rects_tlt, padding=3)
ax.bar_label(rects_quant, padding=3)
ax.bar_label(rects_opt, padding=3)

# Add labels, title, and legend
ax.set_ylabel('Milliseconds')
ax.set_title('Latency')
ax.set_xticks(x + width*1.5, groups)
ax.set_ymargin(0.2) 
ax.legend(ncols=2)
plt.show()

# Round the throughput values
decimals = 0   # number of decimals to use for rounding
tf_throughput_list = [round(x, decimals) for x in tf_throughput_list]
tlt_throughput_list = [round(x, decimals) for x in tlt_throughput_list]
quantized_throughput_list = [round(x, decimals) for x in quantized_throughput_list]
optimized_throughput_list = [round(x, decimals) for x in optimized_throughput_list]

# Setup the grouped bar chart for throughput
fig, ax = plt.subplots()
fig.set_figheight(6)
fig.set_figwidth(10)
rects_tf = ax.bar(x, tf_throughput_list, width, label='TF throughput', color=orange)
rects_tlt = ax.bar(x + width, tlt_throughput_list, width, label='TLT throughput', color=blue)
rects_quant = ax.bar(x + width * 2, quantized_throughput_list, width, label='TLT quantized throughput', color=yellow)
rects_opt = ax.bar(x + width * 3, optimized_throughput_list, width, label='TLT optimized throughput', color=dark_blue)
ax.bar_label(rects_tf, padding=3)
ax.bar_label(rects_tlt, padding=3)
ax.bar_label(rects_quant, padding=3)
ax.bar_label(rects_opt, padding=3)

# Add labels, title, and legend
ax.set_ylabel('images/second')
ax.set_title('Throughput')
ax.set_xticks(x + width*1.5, groups)
ax.set_ymargin(0.2) 
ax.legend(ncols=2)
plt.show()

The experiments done in this notebook allowed us to compare the training time and inference/evaluation metrics when using the TensorFlow libraries and the Intel Transfer Learning tool. We can also see how batch size affects performance. More experiments can be done by rerunning this notebook with a different model, different dataset, and/or different training parameters.

Other related notebooks:
* [Transfer Learning for Image Classification using TensorFlow and the Intel® Transfer Learning Tool API](../image_classification/tlt_api_tf_image_classification/TLT_TF_Image_Classification_Transfer_Learning.ipynb)
* [Transfer Learning for Image Classification using PyTorch and the Intel® Transfer Learning Tool API](../image_classification/tlt_api_pyt_image_classification/TLT_PyTorch_Image_Classification_Transfer_Learning.ipynb)