{"id":3604,"date":"2012-02-06T19:49:38","date_gmt":"2012-02-06T18:49:38","guid":{"rendered":"http:\/\/www.walkingrandomly.com\/?p=3604"},"modified":"2015-10-07T16:49:51","modified_gmt":"2015-10-07T15:49:51","slug":"optimising-a-correlated-asset-calculation-on-matlab-1-vectorisation-on-the-cpu","status":"publish","type":"post","link":"https:\/\/walkingrandomly.com\/?p=3604","title":{"rendered":"Optimising a correlated asset calculation on MATLAB #1 &#8211; Vectorisation on the CPU"},"content":{"rendered":"<p>Recently, I&#8217;ve been working with members of <a href=\"http:\/\/www.mbs.ac.uk\">The Manchester Business School<\/a> to help optimise their <a href=\"http:\/\/www.mathworks.co.uk\/\">MATLAB<\/a> code. We&#8217;ve had some great successes using techniques such as <a href=\"http:\/\/www.mathworks.co.uk\/support\/tech-notes\/1100\/1109.html\">vectorisation<\/a>, <a href=\"https:\/\/www.walkingrandomly.com\/?p=3898\">mex files<\/a> and <a href=\"https:\/\/www.walkingrandomly.com\/?p=1488\">The NAG Toolbox for MATLAB<\/a> (among others) combined with the raw computational power of Manchester&#8217;s<a href=\"http:\/\/en.wikipedia.org\/wiki\/Condor_High-Throughput_Computing_System\"> Condor Pool<\/a> (which I help run along with providing applications support).<\/p>\n<p>A couple of months ago, I had the idea of taking a standard calculation in computational finance and seeing how fast I could make it run on MATLAB using various techniques. I&#8217;d then write these up and publish here for comment.<\/p>\n<p>So, I asked one of my collaborators, Yong Woong Lee, a doctoral researcher in the Manchester Business School, if he could furnish me with a very simple piece computational finance code. I asked for something that was written to make it easy to see the underlying mathematics rather than for speed and he duly obliged with several great examples. I took one of these examples and stripped it down even further to its very bare bones.\u00a0 In doing so I may have made his example close to useless from a computational finance point of view but it gave me something nice and simple to play with.<\/p>\n<p>What I ended up with was a simple piece of code that uses monte carlo techniques to find the distribution of two correlated assets: <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/corr_asset\/1\/original_corr.m\">original_corr.m<\/a><\/p>\n<pre>%ORIGINAL_CORR - The original, unoptimised code that simulates two correlated assets\r\n\r\n%% Correlated asset information\r\nCurrentPrice = [78 102];       %Initial Prices of the two stocks\r\nCorr = [1 0.4; 0.4 1];         %Correlation Matrix\r\nT = 500;                       %Number of days to simulate = 2years = 500days\r\nn = 100000;                    %Number of simulations\r\ndt = 1\/250;                    %Time step (1year = 250days)\r\nDiv=[0.01 0.01];               %Dividend\r\nVol=[0.2 0.3];                 %Volatility\r\n\r\n%%Market Information\r\nr = 0.03;                      %Risk-free rate\r\n\r\n%% Define storages\r\nSimulPriceA=zeros(T,n);    %Simulated Price of Asset A\r\nSimulPriceA(1,:)=CurrentPrice(1);\r\nSimulPriceB=zeros(T,n);    %Simulated Price of Asset B\r\nSimulPriceB(1,:)=CurrentPrice(2);\r\n\r\n%% Generating the paths of stock prices by Geometric Brownian Motion\r\nUpperTriangle=chol(Corr);    %UpperTriangle Matrix by Cholesky decomposition\r\n\r\nfor i=1:n\r\n   Wiener=randn(T-1,2);\r\n   CorrWiener=Wiener*UpperTriangle;\r\n   for j=2:T\r\n      SimulPriceA(j,i)=SimulPriceA(j-1,i)*exp((r-Div(1)-Vol(1)^2\/2)*dt+Vol(1)*sqrt(dt)*CorrWiener(j-1,1));\r\n      SimulPriceB(j,i)=SimulPriceB(j-1,i)*exp((r-Div(2)-Vol(2)^2\/2)*dt+Vol(2)*sqrt(dt)*CorrWiener(j-1,2));\r\n   end\r\nend\r\n\r\n%% Plot the distribution of final prices\r\n% Comment this section out if doing timings\r\n% subplot(1,2,1);hist(SimulPriceA(end,:),100);\r\n% subplot(1,2,2);hist(SimulPriceB(end,:),100);<\/pre>\n<p>On my laptop, this code takes<strong> 10.82 seconds<\/strong> to run on average.\u00a0 If you comment out the final two lines then it&#8217;ll take a little longer and will produce a histogram of the distribution of final prices.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/corr_asset\/1\/corr_assets.png\" alt=\"Two correlated assets\" \/><\/p>\n<p>I&#8217;m going to take this code and modify it in various ways to see how different techniques and technologies can make it run more quickly. Here is a list of everything I have done so far.<\/p>\n<ul>\n<li><a href=\"https:\/\/www.walkingrandomly.com\/?p=360\">Standard vectorisation<\/a> (This article)<\/li>\n<li><a href=\"https:\/\/www.walkingrandomly.com\/?p=3978\">Running on a GPU using the Parallel Computing Toolbox<\/a><\/li>\n<li><a href=\"https:\/\/www.walkingrandomly.com\/?p=4062\"> Running on a GPU using Jacket from AccelerEyes<\/a><\/li>\n<\/ul>\n<p><strong>Vectorisation &#8211; removing loops to make code go faster<\/strong><\/p>\n<p>Now, the most obvious optimisation that we can do with code like this is to use <a href=\"http:\/\/www.mathworks.co.uk\/support\/tech-notes\/1100\/1109.html\">vectorisation<\/a> to get rid of that double loop. The <a href=\"http:\/\/www.mathworks.co.uk\/help\/techdoc\/ref\/cumprod.html\">cumprod<\/a> command is the key to doing this and the resulting code looks as follows: <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/corr_asset\/1\/optimised_corr1.m\">optimised_corr1.m<\/a><\/p>\n<pre>%OPTIMISED_CORR1 - A pure-MATLAB optimised code that simulates two correlated assets\r\n\r\n%% Correlated asset information\r\nCurrentPrice = [78 102];       %Initial Prices of the two stocks\r\nCorr = [1 0.4; 0.4 1];         %Correlation Matrix\r\nT = 500;                       %Number of days to simulate = 2years = 500days\r\nDiv=[0.01 0.01];               %Dividend\r\nVol=[0.2 0.3];                 %Volatility\r\n\r\n%% Market Information\r\nr = 0.03;                      %Risk-free rate\r\n\r\n%% Simulation parameters\r\nn=100000;                       %Number of simulation\r\ndt=1\/250;                       %Time step (1year = 250days)\r\n\r\n%% Define storages\r\nSimulPrices=repmat(CurrentPrice,n,1);\r\nCorrWiener = zeros(T-1,2,n);\r\n\r\n%% Generating the paths of stock prices by Geometric Brownian Motion\r\nUpperTriangle=chol(Corr);    %UpperTriangle Matrix by Cholesky decomposition\r\n\r\nfor i=1:n\r\n     CorrWiener(:,:,i)=randn(T-1,2)*UpperTriangle;\r\nend\r\nVolr = repmat(Vol,[T-1,1,n]);\r\nDivr = repmat(Div,[T-1,1,n]);\r\n\r\n%% do simulation\r\nsim = cumprod(exp((r-Divr-Volr.^2.\/2).*dt+Volr.*sqrt(dt).*CorrWiener));\r\n%get just the final prices\r\nSimulPrices = SimulPrices.*reshape(sim(end,:,:),2,n)';\r\n\r\n%% Plot the distribution of final prices\r\n% Comment this section out if doing timings\r\n%subplot(1,2,1);hist(SimulPrices(:,1),100);\r\n%subplot(1,2,2);hist(SimulPrices(:,2),100);<\/pre>\n<p>This code takes an average of <strong>4.19 seconds<\/strong> to run on my laptop giving us a factor of <strong>2.58 times speed up<\/strong> over the original.\u00a0 This improvement in speed is not without its cost, however, and the price we have to pay is memory.\u00a0 Let&#8217;s take a look at the amount of memory used by MATLAB after running these two versions. First, the original<\/p>\n<pre>&gt;&gt; clear all\r\n&gt;&gt; memory\r\nMaximum possible array:              13344 MB (1.399e+010 bytes) *\r\nMemory available for all arrays:     13344 MB (1.399e+010 bytes) *\r\nMemory used by MATLAB:                 553 MB (5.800e+008 bytes)\r\nPhysical Memory (RAM):                8106 MB (8.500e+009 bytes)\r\n\r\n*  Limited by System Memory (physical + swap file) available.\r\n&gt;&gt; original_corr\r\n&gt;&gt; memory\r\nMaximum possible array:              12579 MB (1.319e+010 bytes) *\r\nMemory available for all arrays:     12579 MB (1.319e+010 bytes) *\r\nMemory used by MATLAB:                1315 MB (1.379e+009 bytes)\r\nPhysical Memory (RAM):                8106 MB (8.500e+009 bytes)\r\n\r\n*  Limited by System Memory (physical + swap file) available.<\/pre>\n<p>Now for the vectorised version<\/p>\n<pre>&gt;&gt; %now I clear the workspace and check that all memory has been recovered%\r\n&gt;&gt; clear all\r\n&gt;&gt; memory\r\nMaximum possible array:              13343 MB (1.399e+010 bytes) *\r\nMemory available for all arrays:     13343 MB (1.399e+010 bytes) *\r\nMemory used by MATLAB:                 555 MB (5.818e+008 bytes)\r\nPhysical Memory (RAM):                8106 MB (8.500e+009 bytes)\r\n\r\n*  Limited by System Memory (physical + swap file) available.\r\n&gt;&gt; optimised_corr1\r\n&gt;&gt; memory\r\nMaximum possible array:              10297 MB (1.080e+010 bytes) *\r\nMemory available for all arrays:     10297 MB (1.080e+010 bytes) *\r\nMemory used by MATLAB:                3596 MB (3.770e+009 bytes)\r\nPhysical Memory (RAM):                8106 MB (8.500e+009 bytes)\r\n\r\n*  Limited by System Memory (physical + swap file) available.<\/pre>\n<p>So, the original version used around 762Mb of RAM whereas the vectorised version used 3041Mb. If you don&#8217;t have enough memory then you may find that the vectorised version runs <strong>very<\/strong> slowly indeed!<\/p>\n<p><strong>Adding a loop to the vectorised code to make it go even faster!<\/strong><\/p>\n<p>Simple vectorisation improvements such as the one above are sometimes so effective that it can lead MATLAB programmers to have a pathological fear of loops. This fear is becoming increasingly unjustified thanks to the steady improvements in MATLAB&#8217;s Just In Time (JIT) compiler.\u00a0 Discussing the details of MATLAB&#8217;s JIT is beyond the scope of these articles but the practical upshot is that you don&#8217;t need to be as frightened of loops as you used to.<\/p>\n<p>In fact, it turns out that once you have finished vectorising your code, you may be able to make it go even faster by putting a loop back in (not necessarily thanks to the JIT though)! The following code takes an average of <strong>3.42 seconds<\/strong> on my laptop bringing the total <strong>speed-up to a factor of 3.16<\/strong> compared to the original.\u00a0 The only difference is that I have added a loop over the variable ii to split up the cumprod calculation over groups of 125 at a time.<\/p>\n<p>I have a confession: I have no real idea why this modification causes the code to go noticeably faster.\u00a0 I do know that it uses a lot less memory; using the <a href=\"http:\/\/www.mathworks.co.uk\/help\/techdoc\/ref\/memory.html\">memory<\/a> command, as I did above, I determined that it uses around 10Mb compared to 3041Mb of the original vectorised version.\u00a0 You may be wondering why I set sub_size to be 125 since I could have chosen any divisor of 100000.\u00a0 Well, I <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/corr_asset\/1\/block_sizes.txt\">tried them all<\/a> and it turned out that 125 was slightly faster than any other on my machine.\u00a0 Maybe we are seeing some sort of CPU cache effect?\u00a0 I just don&#8217;t know: <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/corr_asset\/1\/optimised_corr2.m\">optimised_corr2.m<\/a><\/p>\n<pre>%OPTIMISED_CORR2 - A pure-MATLAB optimised code that simulates two correlated assets\r\n%% Correlated asset information\r\nCurrentPrice = [78 102];       %Initial Prices of the two stocks\r\nCorr = [1 0.4; 0.4 1];         %Correlation Matrix\r\nT = 500;                       %Number of days to simulate = 2years = 500days\r\nDiv=[0.01 0.01];               %Dividend\r\nVol=[0.2 0.3];                 %Volatility\r\n\r\n%% Market Information\r\nr = 0.03;                      %Risk-free rate\r\n\r\n%% Simulation parameters\r\nn=100000;                       %Number of simulation\r\nsub_size = 125;\r\ndt=1\/250;                       %Time step (1year = 250days)\r\n\r\n%% Define storages\r\nSimulPrices=repmat(CurrentPrice,n,1);\r\nCorrWiener = zeros(T-1,2,sub_size);\r\n\r\n%% Generating the paths of stock prices by Geometric Brownian Motion\r\nUpperTriangle=chol(Corr);    %UpperTriangle Matrix by Cholesky decomposition\r\nVolr = repmat(Vol,[T-1,1,sub_size]);\r\nDivr = repmat(Div,[T-1,1,sub_size]);\r\n\r\nfor ii = 1:sub_size:n\r\n   for i=1:sub_size\r\n        CorrWiener(:,:,i)=randn(T-1,2)*UpperTriangle;\r\n   end\r\n\r\n   %% do simulation\r\n   sim = cumprod(exp((r-Divr-Volr.^2.\/2).*dt+Volr.*sqrt(dt).*CorrWiener));\r\n   %get just the final prices\r\n   SimulPrices(ii:ii+sub_size-1,:) = SimulPrices(ii:ii+sub_size-1,:).*reshape(sim(end,:,:),2,sub_size)';\r\nend\r\n%% Plot the distribution of final prices\r\n% Comment this section out if doing timings\r\n%subplot(1,2,1);hist(SimulPrices(:,1),100);\r\n%subplot(1,2,2);hist(SimulPrices(:,2),100);<\/pre>\n<p><strong>Some things that might have worked (but didn&#8217;t)<\/strong><\/p>\n<ol>\n<li>In general, functions are faster than scripts in MATLAB because MATLAB employs more aggressive optimisation techniques for functions.\u00a0 In this case, however, it didn&#8217;t make any noticeable difference on my machine.\u00a0 Try it out for yourself with <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/corr_asset\/1\/optimised_corr3.m\">optimised_corr3.m<\/a><\/li>\n<li>When generating the correlated random numbers, the above code performs lots of small matrix multiplications:\n<pre>   for i=1:sub_size\r\n        CorrWiener(:,:,i)=randn(T-1,2)*UpperTriangle;\r\n   end<\/pre>\n<p>It is often the case that you can get more flops out of a system by doing a single large matrix-matrix multiply than lots of little ones. So, we could do this instead:<\/p>\n<pre>%Generate correlated random numbers\r\n%using one big multiplication\r\nrandoms = randn(sub_size*(T-1),2);\r\nCorrWiener = randoms*UpperTriangle;\r\nCorrWiener = reshape(CorrWiener,(T-1),sub_size,2);\r\nCorrWiener = permute(CorrWiener,[1 3 2]);<\/pre>\n<p>Sadly, however, any gains that we might have made by doing a single matrix-matrix multiply are lost when the resulting matrix is reshaped and permuted into the form needed for the rest of the code (on my machine at least).\u00a0 Feel free to try for yourself using <a href=\"https:\/\/www.walkingrandomly.com\/images\/matlab\/corr_asset\/1\/optimised_corr3.m\">optimised_corr4.m<\/a> &#8211; the input argument of which determines the sub_size.<\/li>\n<\/ol>\n<p><strong>Ask the audience<\/strong><\/p>\n<p>Can you do better using nothing but pure matlab (i.e. no mex, parallel computing toolbox, GPUs or other such trickery&#8230;they are for later articles)?\u00a0 If so then feel free to <a href=\"https:\/\/www.walkingrandomly.com\/?page_id=2055\">contact me<\/a> and let me know.<\/p>\n<p><strong>Acknowledgements<\/strong><\/p>\n<p>Thanks to Yong Woong Lee of the Manchester Business School as well as various employees at The Mathworks for useful discussions and advice.\u00a0 Any mistakes that remain are all my own :)<\/p>\n<p><strong>License<\/strong><\/p>\n<p>All code listed in this article is licensed under the <a href=\"https:\/\/www.walkingrandomly.com\/wp-content\/uploads\/2015\/10\/3-clause-BSD.txt\">3-clause BSD license<\/a>.<\/p>\n<p><strong>The test laptop<\/strong><\/p>\n<ul>\n<li>Laptop model: Dell XPS L702X<\/li>\n<li>CPU:<a href=\"http:\/\/www.notebookcheck.net\/Intel-Core-i7-2630QM-Notebook-Processor.41483.0.html\"> Intel Core i7-2630QM<\/a> @2Ghz software overclockable to 2.9Ghz. 4 physical cores but total 8 virtual cores due to Hyperthreading.<\/li>\n<li>GPU: <a href=\"http:\/\/www.notebookcheck.net\/NVIDIA-GeForce-GT-555M.41933.0.html\">GeForce GT 555M<\/a> with 144 CUDA Cores.\u00a0 Graphics clock: 590Mhz.\u00a0 Processor Clock:1180 Mhz. 3072 Mb DDR3 Memeory<\/li>\n<li>RAM: 8 Gb<\/li>\n<li>OS: Windows 7 Home Premium 64 bit.<\/li>\n<li>MATLAB: 2011b<\/li>\n<\/ul>\n<p><strong>Next Time<\/strong><\/p>\n<p>In the second part of this series I look at how to run this code on the GPU using The Mathworks&#8217; Parallel Computing Toolbox.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, I&#8217;ve been working with members of The Manchester Business School to help optimise their MATLAB code. We&#8217;ve had some great successes using techniques such as vectorisation, mex files and The NAG Toolbox for MATLAB (among others) combined with the raw computational power of Manchester&#8217;s Condor Pool (which I help run along with providing applications [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[53,4,11,7],"tags":[],"class_list":["post-3604","post","type-post","status-publish","format-standard","hentry","category-making-matlab-faster","category-math-software","category-matlab","category-programming"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p3swhs-W8","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/3604","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3604"}],"version-history":[{"count":63,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/3604\/revisions"}],"predecessor-version":[{"id":5875,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=\/wp\/v2\/posts\/3604\/revisions\/5875"}],"wp:attachment":[{"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3604"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3604"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/walkingrandomly.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3604"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}