8  NumPy

NumPy is short for Numerical Python and is the working horse of numerical computing and to that extend for scientific computing.

The most important things your find in numpy are

8.1 Motivation

Speed is often the main motivation to work in numpy.

Let us look at the simple example of the scalar product to see why.

import random
import timeit
import numpy as np

random.seed(42)

min_value = 1
max_value = 100

# Generate the two lists of 10_000_000 random numbers each
a = [random.randint(min_value, max_value) for _ in range(1_000_000)]
b = [random.randint(min_value, max_value) for _ in range(1_000_000)]


def dot(a, b):
    r = 0
    for first, second in zip(a, b):
        r += first * second
    return r


t1 = timeit.timeit(lambda: dot(a, b), number=1000)

np_a = np.array(a, dtype=np.int64)
np_b = np.array(b, dtype=np.int64)

t2 = timeit.timeit(lambda: np.vdot(np_a, np_b), number=1000)

print(f"loop result {dot(a,b)}, time {t1 = }")
print(f"numpy result {np.vdot(np_a, np_b)}, time {t2 = }")
print(f"Speedup {t1/t2}")
loop result 2550205506, time t1 = 52.772689919000015
numpy result 2550205506, time t2 = 0.6065118360000099
Speedup 87.01015674655218

For the above example we see a speed up of 87.0. With numpy we will often see a speed up of 10 to 100 or likely even more.

Note

Lately Python has invested in getting there loops faster, you will see a much higher speed up for this example with old versions of Python.

8.2 References

As we cover a lot of the basics in action in the sister class Basics of Data Science this introduction ends here and we refer to others for a longer read like