When numpy.empty is not faster than numpy.zeros
In many introductions to numpy, one gets taught about np.ones, np.zeros and np.empty. The accepted wisdom is that np.empty will be faster than np.ones because it doesn’t have to waste time doing all that initialisation. A quick test in a Jupyter notebook shows that this seems to be true!
import numpy as np N = 200 zero_time = %timeit -o some_zeros = np.zeros((N,N)) empty_time = %timeit -o empty_matrix = np.empty((N,N)) print('np.empty is {0} times faster than np.zeros'.format(zero_time.average/empty_time.average)) 8.34 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 436 ns ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) np.empty is 19.140587654682545 times faster than np.zeros
20 times faster may well be useful in production when using really big matrices. Might even be worth the risk of dealing with uninitialised variables even though they are scary!
However…..on my machine (Windows 10, Microsoft Surface Book 2 with 16Gb RAM), we see the following behaviour with a larger matrix size (1000 x 1000).
import numpy as np N = 1000 zero_time = %timeit -o some_zeros = np.zeros((N,N)) empty_time = %timeit -o empty_matrix = np.empty((N,N)) print('np.empty is {0} times faster than np.zeros'.format(zero_time.average/empty_time.average)) 113 µs ± 2.97 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 112 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) np.empty is 1.0094651980894993 times faster than np.zeros
The speed-up has vanished! A subsequent discussion on twitter suggests that this is probably because the operating system is zeroing all of the memory for security reasons when using numpy.empty on large arrays.
I don’t do my Python programming, but maybe it’s the malloc vs calloc issues described here. This is the best explanation I ever saw of this phenomenon. https://vorpus.org/blog/why-does-calloc-exist/
Seems like the value of N plays a weird role in np.zeros. For me, np.zeros((1000, 1000)) is somehow 65 times slower (469μs) than np.zeros((10000, 10000)) (7.1μs).
I took the liberty of posting this as a question on StackOverflow: https://stackoverflow.com/questions/58076522/why-is-the-speed-difference-between-pythons-numpy-zeros-and-empty-functions-gon