8 NumPy
NumPy is short for Numerical Python and is the working horse of numerical computing and to that extend for scientific computing.
The most important things your find in numpy are
ndarray
: an efficient multidimensional array class providing arithmetic operations and broadcasting capabilities.- Sub packages for linear algebra
linalg
, random numberrandom
, or fourier transform, to name some. - Vectorization of mathematical functions to work on entire arrays without the need of implementing functions (like all the Basic Linear Algebra Subroutines - BLAS - functions).
- I/O tools for reading and writing arrays to disk and working with memory-mapped files
- A C API for connecting
NumPy
with libraries from C/C++ and FORTRAN like BLAS and LAPACK.
8.1 Motivation
Speed is often the main motivation to work in numpy
.
Let us look at the simple example of the scalar product to see why.
import random
import timeit
import numpy as np
42)
random.seed(
= 1
min_value = 100
max_value
# Generate the two lists of 10_000_000 random numbers each
= [random.randint(min_value, max_value) for _ in range(1_000_000)]
a = [random.randint(min_value, max_value) for _ in range(1_000_000)]
b
def dot(a, b):
= 0
r for first, second in zip(a, b):
+= first * second
r return r
= timeit.timeit(lambda: dot(a, b), number=1000)
t1
= np.array(a, dtype=np.int64)
np_a = np.array(b, dtype=np.int64)
np_b
= timeit.timeit(lambda: np.vdot(np_a, np_b), number=1000)
t2
print(f"loop result {dot(a,b)}, time {t1=}")
print(f"numpy result {np.vdot(np_a, np_b)}, time {t2=}")
print(f"Speedup {t1/t2}")
loop result 2550205506, time t1=54.012639723999996
numpy result 2550205506, time t2=0.60601776499999
Speedup 89.12715574270482
For the above example we see a speed up of 89.0. With numpy
we will often see a speed up of 10 to 100 or likely even more.
Note
Lately Python has invested in getting there loops faster, you will see a much higher speed up for this example with old versions of Python.
8.2 References
As we cover a lot of the basics in action in the sister class Basics of Data Science this introduction ends here and we refer to others for a longer read like
- McKinney (2022) (direct link to the NumPy-Basics)
- MCI Lecture notes of Julian Huber and Matthias Panny Online
NumPy
docs