## Slow multicore performance on Ubuntu Linux -and how to fix it

April 19th, 2010 | Categories: Linux, parallel programming, programming | Tags:

I was recently playing with some parallel code that used gfortran and OpenMP to parallelise a calculation over 4 threads and was getting some seriously odd timing results.  The first time I ran the code it completed in 4.8 seconds, the next run took 4.8 seconds but the third run took 84 seconds (No, I haven’t missed off the decimal point).  Subsequent timings were all over the place – 12 seconds, 4.8 seconds again, 14 seconds….something weird was going on.

I wouldn’t mind but all of these calculations had exactly the same input/output and yet sometimes this parallel code took significantly longer to execute than the serial version.

On drilling down I discovered that two of the threads had 100% CPU utilisation and the other two only had 50%.  On a hunch I wondered if the CPU frequency ‘on demand’ scaling thing was messing things (It has done so in the past) up so I did

sudo cpufreq-selector -c 0 -f 3000
sudo cpufreq-selector -c 1 -f 3000
sudo cpufreq-selector -c 2 -f 3000
sudo cpufreq-selector -c 3 -f 3000

to set the cpu-frequency of all 4 cores to the maxium 3Ghz.  This switched off the ‘on demand’ setting that is standard in Ubuntu.

Lo! it worked! 4.8 seconds every time.  When I turned the governor back on

sudo cpufreq-selector --governor ondemand

I got back the ‘sometimes it’s fast, sometimes it’s slow’ behaviour. Oddly, one week and several reboots later I can’t get back the slow behaviour no matter what I set the governor to.

Perhaps this was just a temporary glitch in my system but, as I said earlier, I have seen this sort of behaviour before so, just to be on the safe side, it might be worth switching off the automatic governor whenever you do parallel calculations in Linux.

Does anyone have any insight into this? Comments welcomed.

1. What version of Ubuntu are you using? I had the same problem under Ubuntu 9.04, but 10.04 handles parallelization (I ran a parallelization of a block matrix multiplication written in C, using OpenMP) a lot better.

Still, the governor *can* bring the frequency back to minimum in case of overheat (I used a temperature monitor to test), which is actually a good thing, for the computer’s sake. So I guess it was just my laptop not able to handle the heat.

2. I think it was 9.04 but might have been 9.10.

If the governor halved the CPU freq to protect the computer and this reduced execution time by half then it would be fine. What I was seeing was much worse than that with execution time taking SIGNIFICANTLY longer than the serial version at any frequency.