8  NumPy

NumPy is short for Numerical Python and is the working horse of numerical computing and to that extend for scientific computing.

The most important things your find in numpy are

8.1 Motivation

Speed is often the main motivation to work in numpy.

Let us look at the simple example of the scalar product to see why.

import random
import timeit
import numpy as np

random.seed(42)

min_value = 1 
max_value = 100

# Generate the two lists of 10_000_000 random numbers each
a = [random.randint(min_value, max_value) for _ in range(1_000_000)]
b = [random.randint(min_value, max_value) for _ in range(1_000_000)]

def dot(a, b):
    r = 0
    for first, second in zip(a, b):
        r += first * second
    return r

t1 = timeit.timeit(lambda: dot(a, b), number=1000)

np_a = np.array(a, dtype=np.int64)
np_b = np.array(b, dtype=np.int64)

t2 = timeit.timeit(lambda: np.vdot(np_a, np_b), number=1000)

print(f"loop result {dot(a,b)}, time {t1=}")
print(f"numpy result {np.vdot(np_a, np_b)}, time {t2=}")
print(f"Speedup {t1/t2}")
loop result 2550205506, time t1=54.012639723999996
numpy result 2550205506, time t2=0.60601776499999
Speedup 89.12715574270482

For the above example we see a speed up of 89.0. With numpy we will often see a speed up of 10 to 100 or likely even more.

Note

Lately Python has invested in getting there loops faster, you will see a much higher speed up for this example with old versions of Python.

8.2 References

As we cover a lot of the basics in action in the sister class Basics of Data Science this introduction ends here and we refer to others for a longer read like