A Parallel programming tip for Mathematica

June 29th, 2009 | Categories: math software, mathematica, programming | Tags:

Someone emailed me recently complaining that the ParallelTable function in Mathematica didn’t work. In fact it made his code slower! Let’s take a look at an instance where this can happen and what we can do about it. I’ll make the problem as simple as possible to allow us to better see what is going on.

Let’s say that all we want to do is generate a list of the prime numbers as quickly as possible using Mathematica. The code to generate the first 20 million primes is very straightforward.

primelist = Table[Prime[k], {k, 1, 20000000}];

To time how long this list takes to calculate we can use the AbsoluteTime function as follows.

t = AbsoluteTime[];
primelist = Table[Prime[k], {k, 1, 20000000}];
time2 = AbsoluteTime[] - t

This took about 63 seconds on my machine. Taking a quick look at the Mathematica documentation we can see that the parallel version of Table is, obviously enough, ParallelTable. What could be simpler? So, to spread this calculation over both processors in a dual-core laptop we simply need to do the following

t = AbsoluteTime[];
primelist = ParallelTable[Prime[k], {k, 1, 20000000}];
time2 = AbsoluteTime[] - t

The result? 98 seconds! So, it seems that the parallel version is more than 50% SLOWER! My correspondent was right – ParallelTable doesn’t seem to work at all.

Before we send a message to Wolfram Tech Support though, let’s dig a little deeper. It turns out that a single evaluation of
Prime[k] for any given k is very quick – even for high k. Prime[100000000] evaluates in a tiny fraction of a second for example (try it! – it really is astonishingly quick.)

What I suspect is happening here is that when you move to the parallel version, the kernels spend most of the time communicating with each other rather than actually calculating anything. I think that ParallelTable does something roughly like the following.

Kernel1:Give me a k.
Master:have k=1.
Kernel2:Give me a k.
Master:have k=2
Kernel1:I’ve done k=1 and the answer is 2, give me another k
Master: have k=3
kernel2:I’ve done k=2 and the answer is 3, give me another k etc

So, the kernels spend more time talking about the work that needs to be done rather than actually doing it. Also, for various reasons, kernel1 and kernel2 might get out of sync and so the Master kernel may end up with a list out of order. If so then it will need to reorder the lists at the end of the calculations and so that’s even more time taken.

So, my approach was to try and reduce the amount of communication between kernels to something like

Master: Kernel 1 – you go away and get me first 10 million primes. Don’t bother me until it’s done.
Master: kernel 2 – you go away and get me the second 10 million. Don’t bother me until it’s done.

The following Mathematica code does this.

t = AbsoluteTime[];
job1 = ParallelSubmit[Table[Prime[k], {k, 1, 10000000}]];
job2 = ParallelSubmit[Table[Prime[k], {k, 10000001, 20000000}]];
{a1, a2} = WaitAll[{job1, job2}];
time2 = AbsoluteTime[] - t

This works in 40 seconds compared to the original time of 62 seconds. Not a 2x speedup but not too shabby for so little extra work on our part.

The moral of the story? Make sure that your code spends more time actually doing work than it does just talking about it.

Disclaimer: I am still learning about Mathematica’s parallel tools myself so don’t take this stuff as gospel. Also, there are almost certainly more efficient ways of getting a list of primes than using the Prime[k] function. I am only using Prime[k] here because it is an example of a function that evaluates very quickly.

Walking Randomly