MATLAB GPU / CUDA experiences and tutorials on my laptop – Introduction

These days it seems that you can’t talk about scientific computing for more than 5 minutes without somone bringing up the topic of Graphics Processing Units (GPUs).  Originally designed to make computer games look pretty, GPUs are massively parallel processors that promise to  revolutionise the way we compute.

A brief glance at the specification of a typical laptop suggests why GPUs are the new hotness in numerical computing.  Take my new one for instance, a Dell XPS L702X, which comes with a Quad-Core Intel i7 Sandybridge processor running at up to 2.9Ghz and an NVidia GT 555M with a whopping 144 CUDA cores.  If you went back in time a few years and told a younger version of me that I’d soon own a 148 core laptop then young Mike would be stunned.  He’d also be wondering ‘What’s the catch?’

Of course the main catch is that all processor cores are not created equally.  Those 144 cores in my GPU are, individually, rather wimpy when compared to the ones in the Intel CPU.  It’s the sheer quantity of them that makes the difference.  The question at the forefront of my mind when I received my shiny new laptop was ‘Just how much of a difference?’

Now I’ve seen lots of articles that compare CPUs with GPUs and the GPUs always win… a lot!  Dig down into the meat of these articles,  however, and it turns out that things are not as simple as they seem.  Roughly speaking, the abstract of some them could be summed up as ‘We took  a serial algorithm written by a chimpanzee for an old, outdated CPU and spent 6 months parallelising and fine tuning it for a top of the line  GPU.  Our GPU version is up to 150 times faster!

Well it would be wouldn’t it?!  In other news, Lewis Hamilton can drive his F1 supercar around Silverstone faster than my dad can in his clapped out 12 year old van!  These articles are so prevalent that recently published an excellent article that summarised everything you should consider when evaluating them.  What you do is take the claimed speed-up, apply a set of common sense questions and thus determine a realistic speedup.  That factor of 150 can end up more like a factor of 8 once you think about it the right way.

That’s not to say that GPUs aren’t powerful or useful…it’s just that maybe they’ve been hyped up a bit too much!

So anyway, back to my laptop.  It doesn’t have a top of the range GPU custom built for scientific computing, instead it has what refers to as a fast middle class graphics card for laptops.  It’s got all of the required bits though….144 cores and CUDA compute level 2.1 so surely it can whip the built in CPU even if it’s just by a little bit?

I decided to find out with a few randomly chosen tests.  I wasn’t aiming for the kind of rigor that would lead to a peer reviewed journal but I did want to follow some basic rules at least

  • I will only choose algorithms that have been optimised and parallelised for both the CPU and the GPU.
  • I will release the source code of the tests so that they can be critised and repeated by others.
  • I’ll do the whole thing in MATLAB using the new GPU functionality in the parallel computing toolbox.  So, to repeat my calculations all you need to do is copy and paste some code.  Using MATLAB also ensures that I’m using good quality code for both CPU and GPU.

The articles

This is the introduction to a set of articles about GPU computing on MATLAB using the parallel computing toolbox.  Links to the rest of them are below and more will be added in the future.

External links of interest to MATLABers with an interest in GPUs

  1. MySchizoBuddy
    July 27th, 2011 at 22:47
    Reply | Quote | #1

    SGI’s recent hot seat presentation at ISC 2011 shows SGI betting Intel’s MIC architecture will be the future of supercomputers and NOT GPU’s. In contrast Cray hot seat presentation bets on GPU’s to be the future of supercomputers.

    SGI presentation video.

  2. July 28th, 2011 at 02:54
    Reply | Quote | #2

    Looking forward to future posts :)

  3. GPUJunkie
    July 28th, 2011 at 22:40
    Reply | Quote | #3

    Like I said on the other thread, when GPUs are the right tool for the right job, they rock.

    Which is to say if you take a computation-intensive or even coherently memory-intensive algorithm that can be parallelized coarsely into sub-tasks and those subtasks can each be subdivided into SIMD-friendly operations, you’ll beat the socks off of all the core cores on multiple CPUs on instruction issue rate alone. Intel’s MIC will show similar behavior here – it’s really nothing magical.

    But if you can’t do that because the algorithm is inherently serial, there’s not much a GPU can do for you unless you have lots and lots of instances of that algorithm to solve.

    That said, what CUDA and OpenCL do today is cleanly abstract away SIMD and multithreading in an OS and platform-independent manner. This allows a scientist/hedge funder/non-engineer who could manage little better than chimpanzee code on a greatest day of their life to suddenly harness all the cores and SIMD lanes of their CPU/GPU/toaster on a regular basis.

    And when he/she sees that magical 150x perf boost and brags about it, along will come some fudgy pants CPU hotshot who previously wouldn’t have given this poor soul the time of day who will now *insist* on optimizing the original CPU monkey code to show this deluded GPU fanboy/girl the error of his/her ways.

    In cases like this (and I’ve seen it several times now), I’d say the CPU expert just got snookered here into doing free work for some half-wit, but maybe I’m being too cynical. If I were being positive, I’d say it’s pretty awesome that an amateur coder got that CPU expert’s attention in the first place.

  4. July 28th, 2011 at 23:10
    Reply | Quote | #4

    @GPUJunkie ‘Like I said on the other thread, when GPUs are the right tool for the right job, they rock.’

    I totally agree with you! I think you and I started off on the wrong foot because my first example showed a case when the GPU wasn’t the right tool for the job. I have every intention of showing when they ARE. I have no anti-GPU agenda…I have every intention of being as balanced as I can be.

    My first article shows GPUs in a relatively negative light. The next will show them in a better light (still via a contrived example but it will be better). The one after that will show them at their best before going onto one that’s about even and so on.

    The over-hyping is a huge issue! I recently looked at a commercial application that had just released a GPU add-on. Their advertising blurb said ‘Up to six times faster than the CPU version’. Do you know why one of our HPC managers dismissed it? It was because he had read all those 150x speed-up papers without digesting the details. This company sold themselves short becuase they were brutally honest…they compared the best CPU against the best GPU and the result was a factor of 6.

    For the timeframe of the calculations involved, a factor of 6 was awesome! Especially when you consider the relatively low cost of the GPUs that provided it. Manager was having none of it though…6 was crappy for a GPU vs CPU according to him.

    Sometimes GPUs are the right choice, sometimes they are not. Yes they rock and they rock enough that they shouldn’t have to pedal bullshit claims to get the point across.

  5. GPUJunkie
    July 29th, 2011 at 18:36
    Reply | Quote | #5

    @Mike: Realistically, a single GTX580 can level anywhere from 4-12 current top of the line Intel CPUs, which is to say it will hit the performance of 16-72 cores, on an entirely (and sometimes seemingly arbitrary) per-algorithm basis. I know this because I’ve done it multiple times and the most prominent instance of that is currently responsible for ~3.1 PFLOPs of Folding@Home. In that case, the performance differential is even starker: one GTX580 ought to match about 30 Intel CPUs (though they like to claim 120x CPUs (assuming 6 cores per CPU) which is indeed true if one compares to craptastic unoptimized CPU code). But Notice that the Intel paper never brought up Folding@Home wherein Intel is getting handed its ego from AMD, NVIDIA, and Sony simultaneously.

    I think if you really want to make a contribution here, why don’t you redo the Intel FUD paper in as unbiased a manner as possible? It’s a good concept, but the paper’s execution was tainted from the get-go. You need to seek out the experts who have spent months to years on each of these algorithms and compare their best work to that of Intel’s. I think you’ll be surprised. Also watch out for algorithmic cherry-picking on both sides of the matter (though to be fair if the algorithm doesn’t parallelize it doesn’t belong on a GPU, period).

    But enough about GPUs, let’s talk about your HPC manager. Was he disappointed by GPU performance matching 6 cores or 6 CPUs? If the former, he’s a smart man, and that’s a disappointing number. If the latter, he needs to be fired, today. The GPU is the uberchoice in that case on perf/W alone.

    As for marketing BS, get real. You’re stuck with it. You need to work with smart people who can filter that and get to the meat rather than complain about what you can’t change.

  6. GPUJunkie
    July 30th, 2011 at 02:03
    Reply | Quote | #6
  7. June 27th, 2012 at 10:30
    Reply | Quote | #7

    Thanks for a great series of articles! While making my way through the individual test pages, I noticed that the URL for last one is a copy/paste error; it should be (“…Using the GPU via Jacket”). Just wanted to let you know.

  8. June 27th, 2012 at 14:56
    Reply | Quote | #8

    Hi Chris

    Glad you enjoyed them :) Thanks for the deadlink info…should be fixed now :)
    Best Wishes,

  9. July 5th, 2012 at 17:55
    Reply | Quote | #9


    It is worth noting that China built the (once) world’s fastest supercomputer using GPUs. Our own system uses a combination of CPUs and GPUs, two GPUs per node.

    Whenever I read claims of phenomenal performance, I dismiss them out of hand unless I know and trust the author. Like other commenters above, I have found way too much under-handed manipulation of results to trust anything I read, even in peer-reviewed journals.

    GPUs seem to me to be a far more generic technology than the MIC. There is even a language that generates code for all comers. MIC seems like a company-specific chip that attempts to undo the broad and open approach of GPUs. It would take one heck of a lot of performance improvement for me to switch.

    Great set of articles and links me man!!