{"id":4167,"date":"2012-02-12T13:40:57","date_gmt":"2012-02-12T12:40:57","guid":{"rendered":"http:\/\/www.walkingrandomly.com\/?p=4167"},"modified":"2012-02-12T13:40:57","modified_gmt":"2012-02-12T12:40:57","slug":"a-brief-look-at-cuda-support-in-maple-15","status":"publish","type":"post","link":"https:\/\/walkingrandomly.com\/?p=4167","title":{"rendered":"A brief look at CUDA support in Maple 15"},"content":{"rendered":"<p>Maple has had support for NVidia GPUs since version 14 but I&#8217;ve not played with it much until recently.\u00a0 Essentially I was put off by the fact that Maple&#8217;s CUDA package seemed to have support for only one function &#8211; Matrix-Matrix Multiplication. However, a recent conversation with a Maple developer changed my mind.<\/p>\n<p>It is true that only MatrixMatrixMultiply has been accelerated but when you flip the CUDA switch in Maple, every function in the LinearAlgebra package that calls MatrixMatrixMultiply also gets accelerated.\u00a0 This leads to the possibility of a lot of speed-ups for very little work.<\/p>\n<p>So, this morning I thought I would take a closer look using my laptop.\u00a0 Let&#8217;s start by timing how long it takes the CPU to multiply two 4000 by 4000 double precision matrices<\/p>\n<pre>with(LinearAlgebra):\r\nCUDA:-Enable(false):\r\nCUDA:-IsEnabled();\r\na := RandomMatrix(4000, datatype = float[8]):\r\nb := RandomMatrix(4000, datatype = float[8]):\r\nt := time[real]():\r\nc := a.b:\r\ntime[real]()-t<\/pre>\n<p>The exact time varied a little from run to run but 3.76 seconds is a typical result.  I&#8217;m only feeling my way at this stage so not doing any proper benchmarking.<\/p>\n<p>To do this calculation on the GPU, all I need to do is change the line<\/p>\n<pre>CUDA:-Enable(false):<\/pre>\n<p>to<\/p>\n<pre>CUDA:-Enable(true):<\/pre>\n<p>like so<\/p>\n<pre>with(LinearAlgebra):\r\nCUDA:-Enable(true):\r\nCUDA:-IsEnabled();\r\na := RandomMatrix(4000, datatype = float[8]):\r\nb := RandomMatrix(4000, datatype = float[8]):\r\nt := time[real]():\r\nc := a.b:\r\ntime[real]()-t<\/pre>\n<p>Typical execution time was 8.37 seconds so the<strong> GPU version is more than 2 times slower than the CPU version on my machine<\/strong>.<\/p>\n<p><strong>Trying different matrix sizes<\/strong><\/p>\n<p>Not wanting to admit defeat after just a single trial, I timed the above code using different matrix sizes.\u00a0 Here are the results<\/p>\n<ul>\n<li>1000 by 1000: CPU=0.07 seconds GPU=0.17 seconds<\/li>\n<li>2000 by 2000: CPU=0.53 seconds GPU=1.07 seconds<\/li>\n<li>4000 by 4000: CPU=3.76 seconds GPU=8.37 seconds<\/li>\n<li>5000 by 5000: CPU=7.44 seconds GPU=19.48 seconds<\/li>\n<\/ul>\n<p><strong>Switching to single precision<\/strong><\/p>\n<p>GPUs do much better with single precision numbers so I had a try with those too.\u00a0 All you need to do is change<\/p>\n<pre>datatype = float[8]<\/pre>\n<p>to<\/p>\n<pre>datatype = float[4]<\/pre>\n<p>in the above code.  The results are:<\/p>\n<ul>\n<li>1000 by 1000: CPU=0.03 seconds GPU=0.07 seconds<\/li>\n<li>2000 by 2000: CPU=0.35 seconds GPU=0.66 seconds<\/li>\n<li>4000 by 4000: CPU=1.86 seconds GPU=2.37 seconds<\/li>\n<li>5000 by 5000: CPU=3.81 seconds GPU=5.2 seconds<\/li>\n<\/ul>\n<p>So the GPU loses in single precision mode too on my hardware.\u00a0 If I can&#8217;t get a speedup with MatrixMatrixMultiply on my system then there is no point in exploring all of the other LinearAlgebra routines since all of them will be slower when moving to CUDA acceleration.<\/p>\n<p>I guess that in this case, my CPU is too powerful and my GPU is too wimpy to see the acceleration I was hoping for.<\/p>\n<p>Thanks to Maplesoft for providing me with a review copy of Maple 15.<\/p>\n<p><strong>Test System Specification<\/strong><\/p>\n<ul>\n<li>Laptop model: Dell XPS L702X<\/li>\n<li>CPU:<a href=\"http:\/\/www.notebookcheck.net\/Intel-Core-i7-2630QM-Notebook-Processor.41483.0.html\"> Intel Core i7-2630QM<\/a> @2Ghz software overclockable to 2.9Ghz. 4 physical cores but total 8 virtual cores due to Hyperthreading.<\/li>\n<li>GPU: <a href=\"http:\/\/www.notebookcheck.net\/NVIDIA-GeForce-GT-555M.41933.0.html\">GeForce GT 555M<\/a> with 144 CUDA Cores.\u00a0 Graphics clock: 590Mhz.\u00a0 Processor Clock:1180 Mhz. 3072 Mb DDR3 Memeory<\/li>\n<li>RAM: 8 Gb<\/li>\n<li>OS: Windows 7 Home Premium 64 bit.<\/li>\n<li>Maple 15<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Maple has had support for NVidia GPUs since version 14 but I&#8217;ve not played with it much until recently.\u00a0 Essentially I was put off by the fact that Maple&#8217;s CUDA package seemed to have support for only one function &#8211; Matrix-Matrix Multiplication. However, a recent conversation with a Maple developer changed my mind. It is [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[44,51,25],"tags":[],"class_list":["post-4167","post","type-post","status-publish","format-standard","hentry","category-cuda","category-gpu","category-maple"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p3swhs-15d","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/4167","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4167"}],"version-history":[{"count":9,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/4167\/revisions"}],"predecessor-version":[{"id":4176,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/4167\/revisions\/4176"}],"wp:attachment":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4167"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4167"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4167"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}