Appendix A — More detailed explanations – Machine Learning in Industrial Image Processing

A.1 Wavelet decomposition for cats and dogs

In the introduction of the Clustering and Classification section we discuss how to use the wavelet transformation¹ to transform the images of cats and dogs into a different basis. Here are the details of how this is performed, with cat zero as example.

Show the code for the figure

import numpy as np
import scipy
import requests
import io
import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ["svg"]

response = requests.get(
    "https://github.com/dynamicslab/databook_python/"
    "raw/refs/heads/master/DATA/catData.mat")
cats = scipy.io.loadmat(io.BytesIO(response.content))["cat"]

plt.figure()
plt.imshow(np.reshape(cats[:, 0], (64, 64)).T, cmap=plt.get_cmap("gray"))
plt.axis("off")

Figure A.1: The original image of cat zero.

We use the Haar-Wavelet and we only need to do one level of transformation. As per usual we get four images, each half the resolution, that represent the decomposition. The images are, a downsampled version of the original image, one highlighting the vertical features, one highlighting the horizontal features, and one highlighting the diagonal features.

Show the code for the figure

import pywt

[A_1, (cH1, cV1, cD1)] = pywt.wavedec2(np.reshape(cats[:, 0], (64, 64)).T,
                                       wavelet="haar", level=1)
plt.figure()
plt.imshow(A_1, cmap=plt.get_cmap("gray"))
plt.axis("off")
plt.figure()
plt.imshow(cH1, cmap=plt.get_cmap("gray"))
plt.axis("off")
plt.figure()
plt.imshow(cV1, cmap=plt.get_cmap("gray"))
plt.axis("off")
plt.figure()
plt.imshow(cD1, cmap=plt.get_cmap("gray"))
plt.axis("off")

Figure A.2: The downsampled version of the image

Figure A.3: The vertical highlights of the image.

Figure A.4: The horizontal highlights of the image.

Figure A.5: The diagonal highlights of the image

For our purposes, only the vertical and horizontal feature are of interest, and we combine these two images. In order to make sure the features are highlighted optimal, we need to rescale the images before combining them. For this we use a similar function like the MATLAB wcodemat function.

def rescale(data, nb):
    x = np.abs(data)
    x = x - np.min(x)
    x = nb * x / np.max(x)
    x = 1 + np.fix(x)
    x[x>nb] = nb
    return x

Show the code for the figure

import pywt

plt.figure()
plt.imshow(cH1 + cV1, cmap=plt.get_cmap("gray"))
plt.axis("off")
plt.figure()
plt.imshow(rescale(cH1, 256) + rescale(cV1, 256), cmap=plt.get_cmap("gray"))
plt.axis("off")

Figure A.6: Combination of vertical and horizontal features unaltered.

Figure A.7: Combination of vertical and horizontal features rescaled.

In total this leads to the following function to transform a list of images (given as row vectors).

import pywt
import math

def img2wave(images):
    l, w = data.shape
    data_w = np.zeros((l // 4, w))
    for i in range(w):
        A = np.reshape(data[:, i], (math.isqrt(l), math.isqrt(l)))
        [A_1, (cH1, cV1, cD1)] = pywt.wavedec2(A, wavelet="haar", level=1)
        data_w[:, i] = np.matrix.flatten(rescale(cH1, 256) + rescale(cV1, 256))
    return data_w

Note, the resulting image has only one forth of the pixels as the original image. We can also visualize the transformation steps as follows in Figure A.8.

Figure A.8: Workflow to get from the original image to the wavelet transformed version.

A.2 Precision/Recall trade-off

In Section 2.2 we discuss the performance topics and we come across the si called precision/recall trade-off.

Lets remind ourself of the definitions:

Recall or true positive rate (TPR) is the rate of relevant instances that are retrieved, or true positive over all occurrences \[ \operatorname{recall} = \frac{TP}{P} = \frac{TP}{TP + FN}. \]

Precision on the other hand is the rate of relevant instances over all retrieved instances, or true positive over the sum of true positive and false positive. \[ \operatorname{precision} = \frac{TP}{TP + FP}. \]

In order to understand why precision and recall influence each other we need to understand how our classifier works.

Internally each observation given to the classifier is fed into a decision function that returns a score.

The score is on some scale and in the default setting, everything above zero is counted as a match, if the threshold is set differently this can change. See Figure A.9. In the presented example we can have a precision from \(71\%\) to \(100\%\) and at the same time a recall from \(100\%\) to \(60\%\).

Figure A.9: Some representatives and their score and three different thresholds and the corresponding results for precision and recall.

Code that provides the basis for the above figure.

import numpy as np
import pandas as pd
import sklearn
from sklearn.datasets import fetch_openml
from sklearn.linear_model import SGDClassifier
np.random.seed(6020)

mnist = fetch_openml('mnist_784', as_frame=False)

X_train, X_test = mnist.data[:60000], mnist.data[60000:]
y_train, y_test = mnist.target[:60000], mnist.target[60000:]

y_train_5 = (y_train == "5")
y_test_5 = (y_test == "5")

SGD = SGDClassifier(random_state=6020)
SGD.fit(X_train, y_train_5)

indices = [22698, 2, 73, 132, 244, 50, 48, 11, 0, 26873]
SGD.decision_function(X_train[indices])

im = np.zeros((1 * 28, 10 * 28), int)
for i, j in enumerate(indices):
    im[:28, 28*i: 28*(i+1)] = X_train[j].reshape(28,28)

plt.imshow(im, cmap="binary")
plt.axis("off")

We can also plot the entire precision/recall curves

Show the code for the figure

from sklearn.metrics import precision_recall_curve
from sklearn.model_selection import cross_val_predict
import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ["svg"]

y_scores = cross_val_predict(SGD, X_train, y_train_5, cv=5,
                             method="decision_function")
precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores)

plt.figure()
plt.plot(thresholds, precisions[:-1], label="Precision")
plt.plot(thresholds, recalls[:-1], "--", label="Recall")
plt.xlim([-95000, 22000])
plt.xlabel("score")
plt.grid()
plt.legend()
plt.gca().set_aspect( 117000 / 3)

plt.figure()
plt.plot(recalls, precisions)
plt.xlabel("recall")
plt.ylabel("precision")
plt.grid()
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.show()

Figure A.10: Precision and recall vs. the score of the decision function.

With the help of the precision vs. recall curve we can select a threshold appropriate for our classification, i.e. level between precision and recall as we see fit and our classification allows.

A.3 Details on `softmax`

For Neural Networks (NNs) a common activation function is the so called softmax function, defined as \[ \sigma: \mathbb{R^n} \to (0, 1)^n, \quad \sigma(x)_i = \frac{\exp(x_i)}{\sum_j{\exp(x_j)}}. \]

In this section we are going to try explain in more detail what it does and what it is used for.

This particular activation function is most commonly used as the output layer of an NN for a multi-class classification/learning problem, as it is the case in our dogs vs. cats example.

Loosely speaking, the main idea is, that it transforms a vector into probabilities. By scaling via the sum of all entries we end with a total sum of 1. This makes us independent from the actual scaling the previous layers of the network worked with and lands us alway in the interval \([0, 1]\). The probability we get can be interpreted as the confidence of the network in the classification per class.

Lets look at it via an example, we assume we have a simple NN with only the two layers, the last being our softmax layer.

Figure A.12: A small NN for classifying hand written digits, we only distinguish between 5, 6 and all others.

We use the hand written letters from Section A.2 as our example, where we only have three classes: 5, 6, else. Here else should be understood as neither 5 nor 6. We classify the images from right to left. The output of our fist layer is a number with no apparent scaling, after the softmax we can see the most likely class suggested from our NN.

Figure A.13: From several images to the output of the NN.

As we can see, the exponential scaling makes sure that we separate quite well and we can easily destinguish between the three classes We can also see, when it is very close and the NN is not very sure what the correct class is.

A.1 Wavelet decomposition for cats and dogs

A.2 Precision/Recall trade-off

A.3 Details on softmax

A.3 Details on `softmax`