Introduction

In this class we are going to look at the basic concepts behind modern day Data Science and machine learning techniques for industrial image processing. We will always try illustrate the theory directly with Python to show how the concepts work programmatically.

The following books cover large sections of the content and serve as the main references for these notes. They are an excellent read to get further into the topics presented, some also include code in Python and MATLAB:

These notes are intended for engineering students and therefore the mathematical concepts will rarely include rigorous proofs.

We start by discussing Clustering and Classification for 1  Unsupervised learning, followed by 2  Supervised learning. In both sections we discuss some of the most prominent examples, how they work, and embed them in the context and the various aspects of the classification task. In 3  Semi-Supervised learning we address a topic of label quality and showcase how we can work with only a view labels and still provide good data for supervised methods.

Next we take a look at Data management and data engineering where we discuss the basic strategies to store data and models, and pipelines to process and move data. With dvc we introduce a tool for connecting data, code, and trained models in a reliable and reproducible way. Furthermore, we discuss the importance of an ETL and give a simple yet powerful example with transforming the image basis to Wavelets.

In the third and final part of this lecture we focus on Neural Networks and Deep Learning. After introducing neural networks and connecting a very simple network back to known regression techniques we start building our own neural networks in 7  Neural Networks. For this we work with pytorch but also provide a reference in keras in the appendix. We discuss backward propagation as the key idea behind neural networks. After this general introduction and definition of the necessary terms we discuss specific classes 8  Convolutional Neural Networks, 9  Autoencoders, and 10  Transfer learning. For each we provide the general idea and showcase the capabilities. In 11  Data Preparation we address topics regarding the training data and the labelling. This section is the one with the least code as these topics are of a general form and usually already included for the datasets used here. The finish up the section on neural networks, we discuss some 12  Common Challenges in the field. The idea of this last section is to provide some guidance if something is not working as expected.

With these notes we can not hope to cover everything of importance with the length it needs but we can give a glimpse into

  1. data management, labeling and preprocessing
  2. life-cycle management of models
  3. machine learning operations (MLOps)
  4. current and research topics