{"id":3730,"date":"2011-07-27T22:02:29","date_gmt":"2011-07-27T21:02:29","guid":{"rendered":"http:\/\/www.walkingrandomly.com\/?p=3730"},"modified":"2013-03-04T16:24:42","modified_gmt":"2013-03-04T15:24:42","slug":"matlab-gpu-cuda-experiences-and-tutorials-on-my-laptop-introduction","status":"publish","type":"post","link":"https:\/\/walkingrandomly.com\/?p=3730","title":{"rendered":"MATLAB GPU \/ CUDA experiences and tutorials on my laptop &#8211; Introduction"},"content":{"rendered":"<p>These days it seems that you can&#8217;t talk about scientific computing for more than 5 minutes without somone bringing up the topic of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Graphics_processing_unit\">Graphics Processing Units<\/a> (GPUs).\u00a0 Originally designed to make computer games look pretty, GPUs are massively parallel processors that promise to\u00a0 revolutionise the way we compute.<\/p>\n<p>A brief glance at the specification of a typical laptop suggests why GPUs are the new hotness in numerical computing.\u00a0 Take my new one for instance, a <a href=\"http:\/\/www.notebookcheck.net\/Review-Dell-XPS-17-Notebook-i7-2630QM-GT-555M.51949.0.html\">Dell XPS L702X<\/a>, which comes with a <a href=\"http:\/\/www.notebookcheck.net\/Intel-Core-i7-2630QM-Notebook-Processor.41483.0.html\">Quad-Core Intel i7 Sandybridge processor<\/a> running at up to 2.9Ghz and an <a href=\"http:\/\/www.notebookcheck.net\/NVIDIA-GeForce-GT-555M.41933.0.html\">NVidia GT 555M<\/a> with a whopping 144 CUDA cores.\u00a0 If you went back in time a few years and told a younger version of me that I&#8217;d soon own a 148 core laptop then young Mike would be stunned.\u00a0 He&#8217;d also be wondering &#8216;What&#8217;s the catch?&#8217;<\/p>\n<p>Of course the main catch is that all processor cores are not created equally.\u00a0 Those 144 cores in my GPU are, individually, rather wimpy when compared to the ones in the Intel CPU.\u00a0 It&#8217;s the sheer quantity of them that makes the difference.\u00a0 The question at the forefront of my mind when I received my shiny new laptop was &#8216;Just how much of a difference?&#8217;<\/p>\n<p>Now I&#8217;ve seen lots of articles that compare CPUs with GPUs and the GPUs always win&#8230;..by a lot!\u00a0 Dig down into the meat of these articles,\u00a0 however, and it turns out that things are not as simple as they seem.\u00a0 Roughly speaking, the abstract of some them could be summed up as &#8216;<strong><em>We took\u00a0 a serial algorithm written by a chimpanzee for an old, outdated CPU and spent 6 months parallelising and fine tuning it for a top of the line\u00a0 GPU.\u00a0 Our GPU version is up to 150 times faster<\/em>!<\/strong>&#8216;<\/p>\n<p>Well it would be wouldn&#8217;t it?!\u00a0 In other news, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Lewis_Hamilton\">Lewis Hamilton<\/a> can drive his F1 supercar around <a href=\"http:\/\/www.silverstone.co.uk\/\">Silverstone<\/a> faster than my dad can in his clapped out 12 year old van!\u00a0 These articles are so prevalent that <a href=\"http:\/\/csgillespie.wordpress.com\/\">csgillespie.wordpress.com<\/a> recently published an excellent article that summarised <a href=\"http:\/\/csgillespie.wordpress.com\/2011\/07\/12\/how-to-review-a-gpu-statistics-paper\/\">everything you should consider when evaluating them<\/a>.\u00a0 What you do is take the claimed speed-up, apply a set of common sense questions and thus determine a realistic speedup.\u00a0 That factor of 150 can end up more like a factor of 8 once you think about it the right way.<\/p>\n<p>That&#8217;s not to say that GPUs aren&#8217;t powerful or useful&#8230;it&#8217;s just that maybe they&#8217;ve been hyped up a bit too much!<\/p>\n<p>So anyway, back to my laptop.\u00a0 It doesn&#8217;t have a top of the range GPU custom built for scientific computing, instead it has what <a href=\"http:\/\/www.notebookcheck.net\/\">Notebookcheck.net<\/a> refers to as a <em><strong>fast middle class graphics card for laptops<\/strong><\/em>.\u00a0 It&#8217;s got all of the required bits though&#8230;.144 cores and CUDA compute level 2.1 so surely it can whip the built in CPU even if it&#8217;s just by a little bit?<\/p>\n<p>I decided to find out with a few randomly chosen tests.\u00a0 I wasn&#8217;t aiming for the kind of rigor that would lead to a peer reviewed journal but I did want to follow some basic rules at least<\/p>\n<ul>\n<li>I will only choose algorithms that have been optimised and parallelised for both the CPU and the GPU.<\/li>\n<li>I will release the source code of the tests so that they can be critised and repeated by others.<\/li>\n<li>I&#8217;ll do the whole thing in MATLAB using the new GPU functionality in the parallel computing toolbox.\u00a0 So, to repeat my calculations all you need to do is copy and paste some code.\u00a0 Using MATLAB also ensures that I&#8217;m using good quality code for both CPU and GPU.<\/li>\n<\/ul>\n<p><strong>The articles<br \/>\n<\/strong><\/p>\n<p>This is the introduction to a set of articles about GPU computing on MATLAB using the parallel computing toolbox.\u00a0 Links to the rest of them are below and more will be added in the future.<\/p>\n<ul>\n<li><a href=\"https:\/\/www.walkingrandomly.com\/?p=3736\">Elementwise operations on the GPU #1<\/a> &#8211; Basic commands using the PCT and how to write a &#8216;GPUs are awesome&#8217; paper; no matter what results you get!<\/li>\n<li><a href=\"https:\/\/www.walkingrandomly.com\/?p=3634\">Elementwise operations on the GPU #2<\/a> &#8211; A slightly more involved example showing a useful speed-up compared to the CPU.\u00a0 An introduction to MATLAB&#8217;s arrayfun<\/li>\n<li><a href=\"https:\/\/www.walkingrandomly.com\/?p=3604\">Optimising a correlated asset calculation on MATLAB #1: Vectorisation on the CPU<\/a> &#8211; A detailed look at a port from CPU MATLAB code to GPU MATLAB code.<\/li>\n<li><a href=\"https:\/\/www.walkingrandomly.com\/?p=3978\">Optimising a correlated asset calculation on MATLAB #2: Using the GPU via the PCT<\/a> &#8211; A detailed look at a port from CPU MATLAB code to GPU MATLAB code.<\/li>\n<li><a href=\"https:\/\/www.walkingrandomly.com\/?p=4062\">Optimising a correlated asset calculation on MATLAB #3: Using the GPU via Jacket<\/a> &#8211;\u00a0 A detailed look at a port from CPU MATLAB code to GPU MATLAB code.<\/li>\n<\/ul>\n<p><strong>External links of interest to MATLABers with an interest in GPUs<\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/www.mathworks.co.uk\/products\/parallel-computing\/\">The Parallel Computing Toolbox (PCT)<\/a> &#8211; The Mathwork&#8217;s MATLAB add-on that gives you CUDA GPU support.<\/li>\n<li><a href=\" http:\/\/people.maths.ox.ac.uk\/gilesm\/matlab_gpu\/index.html\">Mike Gile&#8217;s MATLAB GPU Blog<\/a> &#8211; from the University of Oxford<\/li>\n<li><a href=\"http:\/\/www.accelereyes.com\/\">Accelereyes<\/a> &#8211; Developers of &#8216;Jacket&#8217;, an alternative to the parallel computing toolbox.<\/li>\n<li><a href=\"http:\/\/blogs.mathworks.com\/loren\/2011\/07\/18\/a-mandelbrot-set-on-the-gpu\/\">A Mandelbrot Set on the GPU<\/a> &#8211; Using the parallel computing toolbox to make pretty pictures&#8230;FAST!<\/li>\n<li><a href=\"http:\/\/gp-you.org\/\">GP-you.org<\/a> &#8211; A free CUDA-based GPU toolbox for MATLAB<\/li>\n<li><a href=\"http:\/\/noiceinmyscotchplease.blogspot.com\/2011\/06\/matlab-cuda-and-me.html\">Matlab, CUDA and Me<\/a> &#8211; Stu Blair gives various examples of calling CUDA kernels directly from MATLAB<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>These days it seems that you can&#8217;t talk about scientific computing for more than 5 minutes without somone bringing up the topic of Graphics Processing Units (GPUs).\u00a0 Originally designed to make computer games look pretty, GPUs are massively parallel processors that promise to\u00a0 revolutionise the way we compute. A brief glance at the specification of [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[44,51,53,4,11,41,7,42],"tags":[],"class_list":["post-3730","post","type-post","status-publish","format-standard","hentry","category-cuda","category-gpu","category-making-matlab-faster","category-math-software","category-matlab","category-parallel-programming","category-programming","category-tutorials"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p3swhs-Ya","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/3730","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3730"}],"version-history":[{"count":21,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/3730\/revisions"}],"predecessor-version":[{"id":4861,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/3730\/revisions\/4861"}],"wp:attachment":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3730"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3730"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3730"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}