{"id":1795,"date":"2009-10-21T22:07:18","date_gmt":"2009-10-21T21:07:18","guid":{"rendered":"http:\/\/www.walkingrandomly.com\/?p=1795"},"modified":"2013-05-17T10:13:09","modified_gmt":"2013-05-17T09:13:09","slug":"parallel-matlab-with-openmp-mex-files","status":"publish","type":"post","link":"https:\/\/walkingrandomly.com\/?p=1795","title":{"rendered":"Parallel MATLAB with openmp mex files"},"content":{"rendered":"<p>Slowly but surely more and more MATLAB functions are becoming able to take advantage of multi-core processors.\u00a0 For example, in MATLAB 2009b, functions such as <strong>sort<\/strong>, <strong>bsxfun<\/strong>, <strong>filter<\/strong> and <strong>erf<\/strong> (among others) gained the ability to spread the calculational load across several processor cores.\u00a0 This is good news because if your code uses these functions, and if you have a multi-core processor, then you will get faster execution times without having to modify your program.\u00a0\u00a0 This kind of parallelism is called <a href=\"http:\/\/en.wikipedia.org\/wiki\/Implicit_parallelism\">implicit parallelism<\/a> because it doesn&#8217;t require any special commands in order to take advantage of multiple cores &#8211; MATLAB just does it automagically.\u00a0 Faster code for free!<\/p>\n<p>For many people though, this isn&#8217;t enough and they find themselves needing to use <a href=\"http:\/\/en.wikipedia.org\/wiki\/Explicit_parallelism\">explicit parallelism<\/a> where it becomes the programmer&#8217;s responsibility to tell MATLAB exactly how to spread the work between the multiple processing cores available.\u00a0 The standard way of doing this in MATLAB is to use the <a href=\"http:\/\/www.mathworks.co.uk\/products\/parallel-computing\/\">Parallel Computing Toolbox<\/a> but, unlike packages such as <a href=\"http:\/\/www.wolfram.com\/\">Mathematica<\/a> and <a href=\"http:\/\/www.maplesoft.com\/\">Maple<\/a>, this functionality is going to cost you extra.<\/p>\n<p>There is another way though&#8230;<\/p>\n<p>I&#8217;ve recently been working with <a href=\"http:\/\/www.maths.manchester.ac.uk\/~dszotten\/\">David Szotten<\/a>, a postgraduate student at Manchester University, on writing parallel mex files for MATLAB using <a href=\"http:\/\/openmp.org\/wp\/\">openmp<\/a> and <a href=\"http:\/\/en.wikipedia.org\/wiki\/C_%28programming_language%29\">C<\/a>.\u00a0 Granted, it&#8217;s not as easy as using the Parallel Computing Toolbox but it does work and the results can be blisteringly fast since we are working in C.\u00a0 It&#8217;s also not quite as difficult as we originally thought it might be.<\/p>\n<p>There is one golden rule you need to observe:<\/p>\n<p><strong>Never, ever call any of the mex api functions inside the portion of your code that you are trying to parallelise using openmp.\u00a0 Don&#8217;t try to interact with MATLAB at all during the parallel portion of your code.<\/strong><\/p>\n<p>The reason for the golden rule is because, at the time of writing at least (MATLAB 2009b), all <a href=\"http:\/\/www.mathworks.com\/support\/solutions\/en\/data\/1-V3B5T\/index.html?solution=1-V3B5T\">mex api functions are not thread-safe<\/a> and that includes the <strong>printf<\/strong> function since <strong>printf <\/strong>is defined to be <strong>mexPrintf<\/strong> in the <strong>mex.h<\/strong> header file.\u00a0 Stick to the golden rule and you&#8217;ll be fine.\u00a0 Move away from the golden rule,however, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Here_be_dragons\">and you&#8217;ll find dragons<\/a>.\u00a0 Trust me on this!<\/p>\n<p>Let&#8217;s start with an example and find the sum of foo(x) where foo is a moderately complicated function and x is a large number of values.\u00a0 We have implemented such an example in the mex function mex_sum_openmp.<\/p>\n<p>I assume you are running MATLAB 2009b on 32bit Ubuntu Linux version 9.04 &#8211; just like me.\u00a0 If your setup differs from this then you may need to modify the following instructions accordingly.<\/p>\n<ul>\n<li>Before you start.\u00a0 Close all currently running instances of MATLAB.<\/li>\n<li>Download and unzip the file\u00a0 <a href=\"\/images\/matlab\/mex\/mex_sum_openmp.zip\">mex_sum_openmp.zip<\/a> to give you <strong>mex_sum_openmp.c<\/strong><\/li>\n<li>Open up a bash terminal and enter <strong>export OMP_NUM_THREADS=2<\/strong> (replace 2 with however many cores your machine has)<\/li>\n<li>start MATLAB from this terminal by entering <strong>matlab<\/strong>.<\/li>\n<li>Inside MATLAB, navigate to the directory containing <strong>mex_sum_openmp.c<\/strong><\/li>\n<li>Inside MATLAB, enter the following to compile the .c file\n<pre><strong>mex mex_sum_openmp.c CFLAGS=\"\\$CFLAGS -fopenmp\" LDFLAGS=\"\\$LDFLAGS -fopenmp\"<\/strong><\/pre>\n<\/li>\n<\/ul>\n<ul>\n<li>Inside MATLAB, test the code by entering\n<pre>x=rand(1,9999999); %create an array of random x values\r\nmex_sum_openmp(x) %calculate the sum of foo(x(i))<\/pre>\n<\/li>\n<li>If all goes well, this calculation will be done in parallel and you will be rewarded with a single number.\u00a0 Time the calculation as follows.\n<pre>tic;mex_sum_openmp(x);toc<\/pre>\n<\/li>\n<li>My dual core laptop did this in about 0.6 seconds on average.\u00a0 To see how much faster this is compared to single core mode just repeat all of the steps above but start off with <strong>export OMP_NUM_THREADS=1<\/strong> (You won&#8217;t need to recompile the mex function).\u00a0 On doing this, my laptop did it in 1.17 seconds and so the 2 core mode is almost exactly 2 times faster.<\/li>\n<\/ul>\n<p>Let&#8217;s take a look at the code.<\/p>\n<pre>#include \"mex.h\"\r\n#include &lt;math.h&gt;\r\n#include &lt;omp.h&gt;\r\n\r\ndouble spawn_threads(double x[],int cols)\r\n{\r\n\tdouble sum=0;\r\n\tint count;\r\n        #pragma omp parallel for shared(x, cols) private(count) reduction(+: sum)\r\n\tfor(count = 0;count&lt;cols;count++)\r\n\t{\r\n\t     sum += sin(x[count]*x[count]*x[count])\/cos(x[count]+1.0);;\r\n\t}\r\n\r\n\treturn sum;\r\n}\r\n\r\nvoid mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])\r\n{\r\n\tdouble sum;\r\n\t \/* Check for proper number of arguments. *\/\r\n \tif(nrhs!=1) {\r\n    \tmexErrMsgTxt(\"One input required.\");\r\n  \t} else if(nlhs&gt;1) {\r\n    \tmexErrMsgTxt(\"Too many output arguments.\");\r\n  \t}\r\n\r\n\t\/*Find number of rows and columns in input data*\/\r\n\tint rows = mxGetM(prhs[0]);\r\n\tint cols = mxGetN(prhs[0]);\t\t\r\n\r\n\t\/*I'm only going to consider row matrices in this code so ensure that rows isn't more than 1*\/\r\n\tif(rows&gt;1){\r\n\tmexErrMsgTxt(\"Input data has too many Rows.  Just the one please\");\r\n\t}\r\n\r\n\t\/*Get pointer to input data*\/\r\n\tdouble* x = mxGetPr(prhs[0]);\r\n\r\n\t\/*Do the calculation*\/\r\n\tsum = spawn_threads(x,cols);\r\n\r\n\t\/*create space for output*\/\r\n \t plhs[0] = mxCreateDoubleMatrix(1,1, mxREAL);\r\n\r\n\t\/*Get pointer to output array*\/\r\n\tdouble* y = mxGetPr(plhs[0]);\r\n\r\n\t\/*Return sum to output array*\/\r\n\ty[0]=sum;\r\n\r\n}<\/pre>\n<p>That&#8217;s pretty much it &#8211; the golden rule in action. I hope you find this example useful and thanks again to <a href=\"http:\/\/www.maths.manchester.ac.uk\/~dszotten\/index.html\">David Szotten<\/a> who helped me figure all of this out.\u00a0 Comments welcomed.<\/p>\n<p><strong>Other articles you may find useful<\/strong><\/p>\n<ul>\n<li>\u00a0<a href=\"https:\/\/www.walkingrandomly.com\/?p=3898\">MATLAB mex functions using the NAG C Library in Windows<\/a> &#8211; This article includes an OpenMP example compiled on Visual Studio 2008 in Windows.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Slowly but surely more and more MATLAB functions are becoming able to take advantage of multi-core processors.\u00a0 For example, in MATLAB 2009b, functions such as sort, bsxfun, filter and erf (among others) gained the ability to spread the calculational load across several processor cores.\u00a0 This is good news because if your code uses these functions, [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[52,11,7],"tags":[],"class_list":["post-1795","post","type-post","status-publish","format-standard","hentry","category-making-mathematica-faster","category-matlab","category-programming"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p3swhs-sX","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/1795","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1795"}],"version-history":[{"count":25,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/1795\/revisions"}],"predecessor-version":[{"id":1825,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/1795\/revisions\/1825"}],"wp:attachment":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1795"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1795"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1795"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}