Faster GPU Random Number Generators in MATLAB 2012b
Ever since I took a look at GPU accelerating simple Monte Carlo Simulations using MATLAB, I’ve been disappointed with the performance of its GPU random number generator. In MATLAB 2012a, for example, it’s not much faster than the CPU implementation on my GPU hardware. Consider the following code
function gpuRandTest2012a(n) mydev=gpuDevice(); disp('CPU - Mersenne Twister'); tic CPU = rand(n); toc sg = parallel.gpu.RandStream('mrg32k3a','Seed',1); parallel.gpu.RandStream.setGlobalStream(sg); disp('GPU - mrg32k3a'); tic Rg = parallel.gpu.GPUArray.rand(n); wait(mydev); toc
Running this on MATLAB 2012a on my laptop gives me the following typical times (If you try this out yourself, the first run will always be slower for various reasons I’ll not go into here)
>> gpuRandTest2012a(10000) CPU - Mersenne Twister Elapsed time is 1.330505 seconds. GPU - mrg32k3a Elapsed time is 1.006842 seconds.
Running the same code on MATLAB 2012b, however, gives a very pleasant surprise with typical run times looking like this
CPU - Mersenne Twister Elapsed time is 1.590764 seconds. GPU - mrg32k3a Elapsed time is 0.185686 seconds.
So, generation of random numbers using the GPU is now over 7 times faster than CPU generation on my laptop hardware–a significant improvment on the previous implementation.
New generators in 2012b
The MATLAB developers went a little further in 2012b though. Not only have they significantly improved performance of the mrg32k3a combined multiple recursive generator, they have also implemented two new GPU random number generators based on the Random123 library. Here are the timings for the generation of 100 million random numbers in MATLAB 2012b
- Get the code – gpuRandTest2012b.m
CPU - Mersenne Twister Elapsed time is 1.370252 seconds. GPU - mrg32k3a Elapsed time is 0.186152 seconds. GPU - Threefry4x64-20 Elapsed time is 0.145144 seconds. GPU - Philox4x32-10 Elapsed time is 0.129030 seconds.
Bear in mind that I am running this on the relatively weak GPU of my laptop! If anyone runs it on something stronger, I’d love to hear of your results.
- Laptop model: Dell XPS L702X
- CPU: Intel Core i7-2630QM @2Ghz software overclockable to 2.9Ghz. 4 physical cores but total 8 virtual cores due to Hyperthreading.
- GPU: GeForce GT 555M with 144 CUDA Cores. Graphics clock: 590Mhz. Processor Clock:1180 Mhz. 3072 Mb DDR3 Memeory
- RAM: 8 Gb
- OS: Windows 7 Home Premium 64 bit.
- MATLAB: 2012a/2012b
Tried to run the script, but it couldn’t find “gpuDevice.m”…then I realized it’s in the Parallel Computing Toolbox, which I don’t have. Too bad…just got my GTX Titan and was excited to post.
Sorry about that David…it’s a big problem with MATLAB. It is terribly frustrating to want to run interesting scripts only to discover that the author used toolboxes you don’t own. If only MATLAB was a more integrated product!
No worries – yeah. As a small startup company, I just can’t pony up the thousand dollars per toolbox…my fault for getting hooked in grad school. Julia looks cool: julialang.org – have you experimented with it?
Guess I answered my own question: http://www.walkingrandomly.com/?s=julia
Running Matlab 2012b on Xubuntu 12.10. CPU i7-3820 GPU GTX-680M
>> gpuRandTest2012b(10000)
CPU – Mersenne Twister
Elapsed time is 0.893462 seconds.
GPU – mrg32k3a
Elapsed time is 0.119311 seconds.
GPU – Threefry4x64-20
Elapsed time is 0.052395 seconds.
GPU – Philox4x32-10
Elapsed time is 0.053489 seconds.
Thanks for that, Wilton
Couldn’t run 10000 as my GPU memory is small and my displays are large, but here is Matlab 2013a on Ubuntu Gnome 13.10, i5 2500k and GTX560Ti:
>> gpuRandTest2012b(5000)
CPU – Mersenne Twister
Elapsed time is 0.230607 seconds.
GPU – mrg32k3a
Elapsed time is 0.016539 seconds.
GPU – Threefry4x64-20
Elapsed time is 0.009059 seconds.
GPU – Philox4x32-10
Elapsed time is 0.010122 seconds.
I removed the deprecation message for parallel.gpu.GPUArray.rand.