Archive for the ‘programming’ Category
MATLAB contains a function called pdist that calculates the ‘Pairwise distance between pairs of objects’. Typical usage is
X=rand(10,2); dists=pdist(X,'euclidean');
It’s a nice function but the problem with it is that it is part of the Statistics Toolbox and that costs extra. I was recently approached by a user who needed access to the pdist function but all of the statistics toolbox license tokens on our network were in use and this led to the error message
??? License checkout failed. License Manager Error -4 Maximum number of users for Statistics_Toolbox reached. Try again later. To see a list of current users use the lmstat utility or contact your License Administrator
One option, of course, is to buy more licenses for the statistics toolbox but there is another way. You may have heard of GNU Octave, a free,open-source MATLAB-like program that has been in development for many years. Well, Octave has a sister project called Octave-Forge which aims to provide a set of free toolboxes for Octave. It turns out that not only does Octave-forge contain a statistics toolbox but that toolbox contains an pdist function. I wondered how hard it would be to take Octave-forge’s pdist function and modify it so that it ran on MATLAB.
Not very! There is a script called oct2mat that is designed to automate some of the translation but I chose not to use it – I prefer to get my hands dirty you see. I named the resulting function octave_pdist to help clearly identify the fact that you are using an Octave function rather than a MATLAB function. This may matter if one or the other turns out to have bugs. It appears to work rather nicely:
dists_oct = octave_pdist(X,'euclidean');
% let's see if it agrees with the stats toolbox version
all( abs(dists_oct-dists)<1e-10)
ans =
1
Let’s look at timings on a slightly bigger problem.
>> X=rand(1000,2); >> tic;matdists=pdist(X,'euclidean');toc Elapsed time is 0.018972 seconds. >> tic;octdists=octave_pdist(X,'euclidean');toc Elapsed time is 6.644317 seconds.
Uh-oh! The Octave version is 350 times slower (for this problem) than the MATLAB version. Ouch! As far as I can tell, this isn’t down to my dodgy porting efforts, the original Octave pdist really does take that long on my machine (Ubuntu 9.10, Octave 3.0.5).
This was far too slow to be of practical use and we didn’t want to be modifying algorithms so we ditched this function and went with the NAG Toolbox for MATLAB instead (routine g03ea in case you are interested) since Manchester effectively has an infinite number of licenses for that product.
If,however, you’d like to play with my MATLAB port of Octave’s pdist then download it below.
- octave_pdist.m makes use of some functions in the excellent NaN Toolbox so you will need to download and install that package first.
Reading comma separated value (csv) files into MATLAB is trivial as long as the csv file you are trying to import is trivial. For example, say you wanted to import the file very_clean.txt which contains the following data
1031,-948,-76 507,635,-1148 -1031,948,750 -507,-635,114
The following, very simple command, is all that you need
>> veryclean = csvread('very_clean.txt')
veryclean =
1031 -948 -76
507 635 -1148
-1031 948 750
-507 -635 114
In the real world, however, your data is rarely this nice and clean. One of the most common problems faced by MATLABing data importers is that of header lines. Take the file quite_clean.txt for instance. This is identical to the previous example apart from the fact that it contains two header lines
These are some data that I made using my hand-crafted code Date:12th July 1996 1031,-948,-76 507,635,-1148 -1031,948,750 -507,-635,114
This is all too much for the csvread command
>> data=csvread('quite_clean.txt')
??? Error using ==> dlmread at 145
Mismatch between file and format string.
Trouble reading number from file (row 1, field 1) ==> This
Error in ==> csvread at 52
m=dlmread(filename, ',', r, c);
Not to worry, we can just use the more capable importdata command instead
>> quiteclean = importdata('quite_clean.txt')
quiteclean =
data: [4x3 double]
textdata: {2x1 cell}
The result above is a two element structure array and our numerical values are contained in a field called data. Here’s how you get at it.
>> quiteclean.data
ans =
1031 -948 -76
507 635 -1148
-1031 948 750
-507 -635 114
So far so good. How do we handle a file like messy_data.txt though?
header 1; header 2; 1031,-948,-76, ,"12" 507,635,-1148, ,"34" -1031,948,750, ,"45" -507,-635,114, ,"67"
This is the kind of file encountered by Walking Randomly reader ‘reen’ and it contains exactly the same numerical values as the previous two examples. Unfortunately, it also contains some cruft that makes life more difficult for us. Let’s bring out the big-guns!
Using textscan to import csv files in MATLAB
When the going gets tough, the tough use textscan. Here’s the incantation for importing messy_data.txt
fid=fopen('messy_data.txt');
data = textscan(fid,'%f %f %f %*s %*s','HeaderLines',2,'Delimiter',',','CollectOutput',1);
fclose(fid)
The result is a one-element cell array that contains an array of doubles. Let’s get the array of doubles out of the cell
>> data=data{1}
data =
1031 -948 -76
507 635 -1148
-1031 948 750
-507 -635 114
If the importdata command is a chauffeur then textscan is a Ferrari and I don’t know about you but I’d much rather be driving my own Ferrari than being chauffeured around (John Cook over at The Endeavour has more to say on Ferraris and Chauffeurs).
Let’s de-construct the above set of commands. The first thing to notice is that, unlike csvread and importdata, you have to explicitly open and close your file when using the textscan command. So, you open your file using fopen and give it a file ID (which in this example is fid).
fid=fopen('messy_data.txt');
The first argument to textscan is just this file ID, fid. Next you need to supply a conversion specifier which in this case is
'%f %f %f %*s %*s'
The conversion specifier tells textscan what you want each row in your csv file to be converted to. %f means “64 bit double” and %s means “string” so ‘%f %f %f %s %s’ means “3 doubles followed by 2 strings” (we’ll get onto the asterisks in the original specifier later). You can use all sorts of data types in a conversion specifier such as integers, quoted strings and pattern matched strings among others. Check out the MATLAB documentation for textscan for the full list but an abbreviated list is shown below:
%d signed 32bit integer %u unsigned 32bit integer %f 64bit double (you'll want this most of the time when using MATLAB) %s string
Now, in the command I used to import messy_data.txt the conversion specifier contained some asterisks such as %*s so what do these mean? Quite simply, the asterisk just means ‘ignore’ so %*s means ‘ignore the string in this field’. So, the full meaning of my conversion specifier ‘%f %f %f %*s %*s’ is “read 3 doubles and ignore 2 strings” and textscan will do this for every row.
The rest of the command is pretty self explanatory but I’ll explain it anyway for the sake of completeness
'HeaderLines',2
The file has 2 headerlines which should be ignored
'Delimiter',','
The fields are delimited (a posh word for separated) by a comma
'CollectOutput',1
If you supply a 1 (which stands for True) to the CollectOutput option then textscan will join consecutive output cells with the same data type into a single array. Since I want all of my doubles to be in a single array then this is the behaviour I went for.
Finally, once you have finished textscanning, don’t forget to close your file
fclose(fid)
That’s pretty much it for this mini-tutorial – I hope you find it useful.
Late last year I was asked to give a talk at my University to a group of academics studying financial mathematics (Black-Scholes, monte-carlo simulations, time series analysis – that sort of thing). The talk was about the NAG Numerical Library, what it could do, what environments you can use it from, licensing and how they could get their hands on it via our site license.
I also spent a few minutes talking about Python including why I thought it was a good language for numerical computing and how to interface it with the NAG C library using my tutorials and PyNAG code. I finished off with a simple graphical demonstration that minimised the Rosenbrock function using Python, Matplotlib and the NAG C-library running on Ubuntu Linux.
“So!”, I asked, “Any questions?”
The very first reply was “Does your Python-NAG interface work on Windows machines?” to which the answer at the time was “No!” I took the opportunity to ask the audience how many of them did their numerical computing in Windows (Most of the room of around 50+ people), how many people did it using Mac OS X (A small group at the back), and how many people did it in Linux (about 3).
So, if I wanted the majority of that audience to use PyNAG then I had to get it working on Windows. Fortunately, thanks to the portability of Python and the consistency of the NAG library across platforms, this proved to be rather easy to do and the result is now available for download on the main PyNAG webpage.
Let’s look at the main differences between the Linux and Windows versions
Loading the NAG Library
In the examples given in my Linux NAG-Python tutorials, the cllux08dgl NAG C-library was loaded using the line
libnag = cdll.LoadLibrary("/opt/NAG/cllux08dgl/lib/libnagc_nag.so.8")
In Windows the call is slightly different. Here it is for CLDLL084ZL (Mark 8 of the C library for Windows)
C_libnag = ctypes.windll.CLDLL084Z_mkl
If you are modifying any of the codes shown in my NAG-Python tutorials then you’ll need to change this line yourself. If you are using PyNAG, however, then it will already be done for you.
Callback functions
For my full introduction to NAG callback functions on Linux click here. The main difference between using callbacks on Windows compared to Linux is that on Linux we have
C_FUNC = CFUNCTYPE(c_double,c_double )
but on Windows we have
C_FUNC = WINFUNCTYPE(c_double,c_double )
There are several callback examples in the PyNAG distribution.
That’s pretty much it I think. Working through all of the PyNAG examples and making sure that they ran on Windows uncovered one or two bugs in my codes that didn’t affect Linux for one reason or another so it was a useful exercise all in all.
So, now you head over to the main PyNAG page and download the Windows version of my Python/NAG interface which includes a set of example codes. I also took the opportunity to throw in a couple of extra features and so upgraded PyNAG to version 0.16, check out the readme for more details. Thanks to several employees at NAG for all of their help with this including Matt, John, David, Marcin and Sorin.
I work for the University of Manchester in the UK as a ‘Science and Engineering Applications specialist’ which basically means that I am obsessed with software used by mathematicians and scientists. One of the applications within my portfolio is the NAG library – a product that we use rather a lot in its various incarnations. We have people using it from Fortran, C, C++, MATLAB, Python and even Visual Basic in areas as diverse as engineering, applied maths, biology and economics.
Yes, we are big users of NAG at Manchester but then that stands to reason because NAG and Manchester have a collaborative history stretching back 40 years to NAG’s very inception. Manchester takes a lot of NAG’s products but for reasons that are lost in the mists of time, we have never (to my knowledge at least) had a site license for their SMP library (more recently called The NAG Library for SMP & multicore). Very recently, that changed!
SMP stands for Symmetric Multi-Processor which essentially means ‘two or more CPUs sharing the same bit of memory.’ Five years ago, it would have been rare for a desktop user to have an SMP machine but these days they are everywhere. Got a dual-core laptop or a quad-core desktop? If the answer’s ‘yes’ then you have an SMP machine and you are probably wondering how to get the most out of it.
‘How can I use all my cores (without any effort)’
One of the most common newbie questions I get asked these days goes along the lines of ‘Program X is only using 50%/25%/12.5% of my CPU – how can I fix this?’ and, of course, the reason for this is that the program in question is only using a single core of their multicore machine. So, the problem is easy enough to explain but not so easy to fix because it invariably involves telling the user that they are going to have to learn how to program in parallel.
Explicit parallel programming is a funny thing in that sometimes it is trivial and other times it is pretty much impossible – it all depends on the problem you see. Sometimes all you need to do is drop a few OpenMP pragmas here and there and you’ll get a 2-4 times speed up. Other times you need to completely rewrite your algorithm from the ground up to get even a modest speed up. Yet more times you are simply stuck because your problem is inherently non-parallelisable. It is even possible to slow your code down by trying to parallelize it!
If you are lucky, however, then you can make use of all those extra cores with no extra effort at all! Slowly but surely, mathematical software vendors have been re-writing some of their functions to ensure that they work efficiently on parallel processors in such a way that it is transparent to the user. This is sometimes referred to as implicit parallelism.
Take MATLAB, for example, ever since release 2007a more and more built in MATLAB functions have been modified to allow them to make full use of multi-processor systems. If your program makes extensive use of these functions then you don’t need to spend extra money or time on the parallel computing toolbox, just run your code on the latest version of MATLAB, sit back and enjoy the speed boost. It doesn’t work for all functions of course but it does for an ever increasing subset.
The NAG SMP Library – zero effort parallel programming
For users of NAG routines, the zero-effort approach to making better use of your multicore system is to use their SMP library. According to NAG’s advertising blurb you don’t need to rewrite your code to get a speed boost – you just need to link to the SMP library instead of the Fortran library at compile time.
Just like MATLAB, you won’t get this speed boost for every function, but you will get it for a significant subset of the library (around 300+ functions as of Mark 22 – the full list is available on NAG’s website). Also, just like MATLAB, sometimes the speed-up will be massive and other times it will be much more modest.
I wanted to test NAG’s claims for myself on the kind of calculations that researchers at Manchester tend to perform so I asked NAG for a trial of the upcoming Mark 22 of the SMP library and, since I am lazy, I also asked them to provide me with simple demonstration programs and I’d like to share the results with you all. These are not meant to be an exhaustive set of benchmarks (I don’t have the time to do such a thing) but should give you an idea of what you can expect from NAG’s new library.
System specs
All of these timings were made on a 3Ghz Intel Core2 Quad Q9650 CPU desktop machine with 8Gb of RAM running Ubuntu 9.10. The serial version of the NAG library used was fll6a22dfl and the SMP version of the library was fsl6a22dfl. The version of gfortran used was 4.4.1 and the cpu-frequency governor was switched off as per a previous blog post of mine.
Example 1 – Nearest correlation matrix
The first routine I looked at was one that calculated the nearest correlation matrix. In other words ‘Given a symmetric matrix, what is the nearest correlation matrix—that is, the nearest symmetric positive semidefinite matrix with unit diagonal?’[1] This is a problem that pops up a lot in Finance – option pricing, risk management – that sort of thing.
The NAG routine that calculates the nearest correlation matrix is G02AAF which is based on the algorithm developed by Qi and Sun[2]. The example program I was sent first constructs a N x N tridiagonal matrix A of the form A(j,j)=2, A(j-1,j)=-1.1 and A(j,j-1)=1.0. It then times how long it takes G02AAF to find the nearest correlation matrix to A. You can download this example, along with all supporting files, from the links below.
To compile this benchmark against the serial library I did
gfortran ./bench_g02aaf.f90 shtim.c wltime.f /opt/NAG/fll6a22dfl/lib/libnag_nag.a \ /opt/NAG/fll6a22dfl/acml/libacml_mv.a -o serial_g02aaf
To compile the parallel version I did
gfortran -fopenmp ./bench_g02aaf.f90 shtim.c wltime.f /opt/NAG/fsl6a22dfl/lib/libnagsmp.a \ /opt/NAG/fsl6a22dfl/acml4.3.0/lib/libacml_mp.a -o smp_g02aaf
I didn’t need to modify the source code in any way when going from a serial version to a parallel version of this benchmark. The only work required was to link to the SMP library instead of the serial library – so far so good. Let’s run the two versions and see the speed differences.
First things first, let’s ensure that there is no limit to the stack size of my shell by doing the following in bash
ulimit -s unlimited
Also, for the parallel version, I need to set the number of threads to equal the number of processors I have on my machine by setting the OMP_NUM_THREADS environment variable.
export OMP_NUM_THREADS=4
This won’t affect the serial version at all. So, onto the program itself. When you run it, it will ask you for two inputs – an integer, N, and a boolean, IO. N gives the size of the starting matrix and IO determines whether or not to output the result.
Here’s an example run of the serial version with N=1000 and IO set to False. (In Fortran True is represented as .t. and False is represented as .f. )
./serial_g02aaf G02AAF: Enter N, IO 1000 .f. G02AAF: Time, IFAIL = 37.138997077941895 0 G02AAF: ITER, FEVAL, NRMGRD = 4 5 4.31049822255544465E-012
The only output I am really interested in here is the time and 37.13 seconds doesn’t seem too bad for a 1000 x 1000 matrix at first glance. Move to the parallel version though and you get a very nice surprise
./smp_g02aaf G02AAF: Enter N, IO 1000 .f. G02AAF: Time, IFAIL = 5.1906139850616455 0 G02AAF: ITER, FEVAL, NRMGRD = 4 5 4.30898281428799420E-012
The above times were typical and varied little from run to run (although the SMP version varied by a bigger percentage than the serial version). However I averaged over 10 runs to make sure and got 37.14 s for the serial version and 4.88 s for the SMP version which gives a speedup of 7.6 times!
Now, this is rather impressive. Usually, when one parallelises over N cores then the very best you can expect in an ideal word is a speed up of just less than N times, so called ‘linear scaling’. Here we have N=4 and a speedup of 7.6 implying that NAG have achieved ‘super-linear scaling’ which is usually pretty rare.
I dropped them a line to ask what was going on. It turns out that when they looked at parallelising this routine they worked out a way of speeding up the serial version as well. This serial speedup will be included in the serial library in its next release, Mark 23 but the SMP library got it as of Mark 22.
So, some of that 7.6 times speedup is as a result of serial speedup and the rest is parallelisation. By setting OMP_NUM_THREADS to 1 we can force the SMP version of the benchmark to only run on only one core and thus see how much of the speedup we can attribute to parallelisation:
export OMP_NUM_THREADS=1 ./smp_g02aaf G02AAF: Enter N, IO 1000 .f. G02AAF: Time, IFAIL = 12.714214086532593 0 G02AAF: ITER, FEVAL, NRMGRD = 4 5 4.31152424170390294E-012
Recall that the 4 core version took an average of 4.88 seconds so the speedup from parallelisation alone is 2.6 times – much closer to what I expected to see. Now, it is probably worth mentioning that there is an extra level of complication (with parallel programming there is always an extra level of complication) in that some of this parallelisation speedup comes from extra work that NAG has done in their algorithm and some of it comes from the fact that they are linking to parallel versions of the BLAS and LAPACK libraries. We could go one step further and determine how much of the speed up comes from NAG’s work and how much comes from using parallel BLAS/LAPACK but, well, life is short.
The practical upshot is that if you come straight from the Mark 22 serial version then you can expect a speed-up of around 7.6 times. In the future, when you compare the Mark 22 SMP library to the Mark 23 serial library then you can expect a speedup of around 2.6 times on a 4 core machine like mine.
Example 2 – Quasi random number generation
Quasi random numbers (also referred to as ‘Low discrepancy sequences’) are extensively used in Monte-Carlo simulations which have applications in areas such as finance, chemistry and computational physics. When people need a set of quasi random numbers they usually need a LOT of them and so the faster they can be produced the better. The NAG library contains several quasi random number generators but the one considered here is the routine g05ymf. The benchmark program I used is called bench_g05ymf.f90 and the full set of files needed to compile it are available at the end of this section.
The benchmark program requires the user to input 4 numbers and a boolean as follows.
- The first number is the quasi random number generator to use with the options being:
- NAG’s newer Sobol generator (based on the 2008 Joe and Kuo algorithm[3] )
- NAG’s older Sobol generator
- NAG’s Niederreiter generator
- NAG’s Faure generator
- The second number is the order in which the generated values are returned (The parameter RCORD as referred to in the documentation for g05ymf). Say that the matrix of returned values is called QUAS then if RCORD=1, QUAS(i,j) holds the jth value for the ith dimension, otherwise QUAS(i,j) holds the ith value for the jth dimension.
- The third number is the number of dimensions required.
- The fourth number is the number of number of quasi-random numbers required.
- The boolean (either .t. or .f.) determines whether or not the output should be verbose or not. A value of .t. will output the first 100 numbers in the sequence.
To compile the benchmark program against the serial library I did:
gfortran ./bench_g05ymf.f90 shtim.c wltime.f /opt/NAG/fll6a22dfl/lib/libnag_nag.a \ /opt/NAG/fll6a22dfl/acml/libacml_mv.a -o serial_g02aaf
As before, the only thing needed to turn this into a parallel program was to compile against the SMP library and add the -fopenmp switch
gfortran -fopenmp ./bench_g05ymf.f90 shtim.c wltime.f /opt/NAG/fsl6a22dfl/lib/libnagsmp.a \ /opt/NAG/fsl6a22dfl/acml4.3.0/lib/libacml_mp.a -o smp_g02aaf
The first set of parameters I used was
1 2 900 1000000 .f.
Which uses NAG’s new Sobol generator with RCORD=2 to generate and store 1,000,000 numbers over 900 dimensions with no verbose output. Averaged over 10 runs the times were 5.17 seconds for the serial version and 2.14 seconds for the parallel version giving a speedup of 2.4x on a quad-core machine. I couldn’t push number of dimensions much higher because the benchmark stores all of the numbers in one large array and I was starting to run out of memory.
If you only want a relatively small sequence then switching to the SMP library is actually slower thanks to the overhead involved in spawning extra threads. For example if you only want 100,000 numbers over 8 dimensions:
1 2 8 100000 .f.
then the serial version of the code takes an average of 0.0048 seconds compared to 0.0479 seconds for the parallel version so the parallel version is almost 10 times slower when using 4 threads for small problems. Setting OMP_NUM_THREADS to 1 gives exactly the same speed as the serial version as you might expect.
NAG have clearly optimized this function to ensure that you get a good speedup for large problem sets which is where it really matters so I am very pleased with it. However, the degradation in performance for smaller problems concerns me a little. I think I’d prefer it if NAG were to modify this function so that it works serially for small sequences and to automatically switch to parallel execution if the user requests a large sequence.
Conclusions
In the old days we could speedup our programs simply by buying a new computer. The new computer would have a faster clock speed than the old one and so our code would run faster with close to zero effort on our part. Thanks to the fact that clock speeds have stayed more or less constant for the last few years those days are over. Now, when we buy a new computer we get more processors rather than faster ones and this requires a change in thinking. Products such as the NAG Library for SMP & multicore help us to to get the maximum benefit out of our machines with the minimum amount of effort on our part. If switching to a product like this doesn’t give you the performance boost you need then the next thing for you to do is to learn how to program in parallel. The free ride is over.
In summary:
- You don’t need to modify your code if it already uses NAG’s serial library. Just recompile against the new library and away you go. You don’t need to know anything about parallel programming to make use of this product.
- The SMP Library works best with big problems. Small problems don’t work so well because of the inherent overheads of parallelisation.
- On a quad-core machine you can expect speed-ups around 2-4 times compared to the serial library. In exceptional circumstances you can expect speed-up as large as 7 times or more.
- You should notice a speed increase in over 300 functions compared to the serial library.
- Some of this speed increase comes from fundamental libraries such as BLAS and LAPACK, some of it comes from NAG directly parallelising their algorithms and yet more comes from improvements to the underlying serial code.
- I’m hoping that this Fortran version is just the beginning. I’ve always felt that NAG program in Fortran so I don’t have to and I’m hoping that they will go on to incorporate their SMP improvements in their other products, especially their C library and MATLAB toolbox.
Acknowledgements
Thanks to several staff at NAG who suffered my seemingly endless stream of emails during the writing of this article. Ed, in particular, has the patience of a saint. Any mistakes left over are all mine!
Links
- The Nearest Correlation Matrix benchmark program used here – bench_g02aaf.f90 – along with the source code for the timing routine.
- NAG’s documentation for the g02aaf routine.
- The Quasi Random Number benchmark program used here – bench_g05ymf.f90 – along with the source code for the timing routine.
- An Introduction to quasi random numbers (written by a member of NAG’s staff)
- Official NAG documentation for g05ymf – the quasi random number generator considered here.
References
[1] – Higham N, Computing the nearest correlation matrix—a problem from finance, IMA Journal of Numerical Analysis 22 (3): 329–343
[2] – Qi H and Sun D (2006), A Quadratically Convergent Newton Method for Computing the Nearest
Correlation Matrix, SIAM J. Matrix AnalAppl 29(2) 360–385
[3] – Joe S and Kuo F Y (2008) Constructing Sobol sequences with better two-dimensional projects SIAM J. Sci. Comput. 30 2635–2654
Dislaimer: This is a personal blog and the opinions expressed on it are mine and do not necessarily reflect the policies and opinions of my employer, The University of Manchester.
I was recently playing with some parallel code that used gfortran and OpenMP to parallelise a calculation over 4 threads and was getting some seriously odd timing results. The first time I ran the code it completed in 4.8 seconds, the next run took 4.8 seconds but the third run took 84 seconds (No, I haven’t missed off the decimal point). Subsequent timings were all over the place – 12 seconds, 4.8 seconds again, 14 seconds….something weird was going on.
I wouldn’t mind but all of these calculations had exactly the same input/output and yet sometimes this parallel code took significantly longer to execute than the serial version.
On drilling down I discovered that two of the threads had 100% CPU utilisation and the other two only had 50%. On a hunch I wondered if the CPU frequency ‘on demand’ scaling thing was messing things (It has done so in the past) up so I did
sudo cpufreq-selector -c 0 -f 3000 sudo cpufreq-selector -c 1 -f 3000 sudo cpufreq-selector -c 2 -f 3000 sudo cpufreq-selector -c 3 -f 3000
to set the cpu-frequency of all 4 cores to the maxium 3Ghz. This switched off the ‘on demand’ setting that is standard in Ubuntu.
Lo! it worked! 4.8 seconds every time. When I turned the governor back on
sudo cpufreq-selector --governor ondemand
I got back the ‘sometimes it’s fast, sometimes it’s slow’ behaviour. Oddly, one week and several reboots later I can’t get back the slow behaviour no matter what I set the governor to.
Perhaps this was just a temporary glitch in my system but, as I said earlier, I have seen this sort of behaviour before so, just to be on the safe side, it might be worth switching off the automatic governor whenever you do parallel calculations in Linux.
Does anyone have any insight into this? Comments welcomed.
A Mathematica user recently asked me if I could help him speed up the following Mathematica code. He has some data and he wants to model it using the following model
dFIO2 = 0.8;
PBH2O = 732;
dCVO2[t_] := 0.017 Exp[-4 Exp[-3 t]];
dPWO2[t_?NumericQ, v_?NumericQ, qL_?NumericQ, q_?NumericQ] :=
PBH2O*((v/(v + qL))*dFIO2*(1 - E^((-(v + qL))*t)) +
q*NIntegrate[E^((v + qL)*(\[Tau] - t))*dCVO2[\[Tau]], {\[Tau], 0, t}])
Before starting to apply this to his actual dataset, he wanted to check out the performance of the NonlinearModelFit[] command. So, he created a completely noise free dataset directly from the model itself
data = Table[{t, dPWO2[t, 33, 25, 55]}, {t, 0, 6, 6./75}];
and then, pretending that he didn’t know the parameters that formed that dataset, he asked NonlinearModelFit[] to find them:
fittedModel = NonlinearModelFit[data, dPWO2[t, v, qL, q], {v, qL, q}, t]; // Timing
params = fittedModel["BestFitParameters"]
On my machine this calculation completes in 11.21 seconds and gives the result we expect:
{11.21, Null}
{v -> 33., qL-> 25., q -> 55.}
11.21 seconds is too slow though since he wants to evaluate this on real data many hundreds of times. How can we make it any faster?
I started off by trying to make NIntegrate go faster since this will be evaluated potentially hundreds of times by the optimizer. Turning off Symbolic pre-processing did the trick nicely in this case. To turn off Symbolic pre-processing you just add
Method -> {Automatic, "SymbolicProcessing" -> 0}
to the end of the NIntegrate command. So, we re-define his model function as
dPWO2[t_?NumericQ, v_?NumericQ, qL_?NumericQ, q_?NumericQ] :=
PBH2O*((v/(v + qL))*dFIO2*(1 - E^((-(v + qL))*t)) +
q*NIntegrate[E^((v + qL)*(\[Tau] - t))*dCVO2[\[Tau]], {\[Tau], 0, t}
,Method -> {Automatic, "SymbolicProcessing" -> 0}])
This speeds things up almost by a factor of 2.
fittedModel = NonlinearModelFit[data, dPWO2[t, v, qL, q], {v, qL, q}, t]; // Timing
params = fittedModel["BestFitParameters"]
{6.13, Null}
{v -> 33., qL -> 25., q -> 55.}
Not bad but can we do better?
Giving Mathematica a helping hand.
Like most optimizers, NonlinearModelFit[] will work a lot faster if you give it a decent starting guess. For example if we provide a close-ish guess at the starting parameters then we get a little more speed
fittedModel =
NonlinearModelFit[data,
dPWO2[t, v, qL, q], {{v, 25}, {qL, 20}, {q, 40}},
t]; // Timing
params = fittedModel["BestFitParameters"]
{5.78, Null}
{v -> 33., qL -> 25., q -> 55.}
give an even better starting guess and we get yet more speed
fittedModel =
NonlinearModelFit[data,
dPWO2[t, v, qL, q], {{v, 31}, {qL, 24}, {q, 54}},
t]; // Timing
params = fittedModel["BestFitParameters"]
{4.27, Null}
{v -> 33., qL -> 25., q -> 55.}
Ask the audience
So, that’s where we are up to so far. Neither of us have much experience with this particular part of Mathematica so we are hoping that someone out there can speed this calculation up even further. Comments are welcomed.
A little while ago I was having a nice tea break when all hell broke loose. Complaint after complaint started rolling in about lack of network licenses for the MATLAB statistics toolbox and everyone was wondering what I intended to do about it. A quick look at our license server indicated that someone had started a large number of MATLAB jobs on a Beowulf cluster and all of them used the statistics toolbox. Slowly but surely, he was eating up every statistics toolbox license in the university.
I contacted him, explained the problem and he terminated all of his jobs. Life returned to normal for the rest of the university but now he couldn’t work. What to do?
One option was to compile his code using the MATLAB compiler and send the resulting binaries to the cluster but I took another route. I had a look through his code and discovered that he was only ever calling one function from the Statistics toolbox and that was
random('unif',0,1)
All this does is give a uniform random number between 0 and 1. However, if you code it like this
rand()
then you won’t use a license from the statistics toolbox since rand() is part of basic MATLAB. Of course most people don’t want a random number between 0 and 1; instead they will want a random number between two constants, a and b. In almost all cases the following two statements are mathematically equivalent
r = a + (b-a).*rand(); %Doesn't use a license for the statistics toolbox
r = random('unif',a,b); %Does use a license for the statistics toolbox
The statistics toolbox function, random(), is much more general than rand() since it can give you random numbers from many different distributions but you are wasting a license token if you use it for the uniform distribution. The same goes for the normal distribution for that matter since basic MATLAB has randn().
random('norm',mu,sigma); %does use a stats license
r = randn()* sigma + mu; %doesn't use a stats license
The moral of the story is that when you are using MATLAB in a network licensed environment, it can sometimes pay to consider how you spell your functions ;)
The Problem
A MATLAB user came to me with a complaint recently. He had a piece of code that made use of the MATLAB Statistics toolbox but couldn’t get reliable access to a stats toolbox license from my employer’s server. You see, although we have hundreds of licenses for MATLAB itself, we only have a few dozen licenses for the statistics toolbox. This has always served us well in the past but use of this particular toolbox is on the rise and so we sometimes run out which means that users are more likely to get the following error message these days
??? License checkout failed. License Manager Error -4 Maximum number of users for Statistics_Toolbox reached. Try again later. To see a list of current users use the lmstat utility or contact your License Administrator.
The user had some options if he wanted to run his code via our shared network licenses for MATLAB (rather than rewrite in a free language or buy his own, personal license for MATLAB):
- Wait until a license became free.
- Wait until we found funds for more licenses.
- Buy an additional license token for himself and add it to the network (235 pounds +VAT currently)
- Allow me to gave a look at his code and come up with another way
He went for the final option. So, I sat with a colleague of mine and we looked through the code. It turned out that there was only one line in the entire program that used the Statistics toolbox and all that line did was call binopdf. That one function call almost cost him 235 quid!
Now, binopdf() simply evaluates the binomial probability density function and the definition doesn’t look too difficult so you may wonder why I didn’t just cook up my own version of binopdf for him? Quite simply, I don’t trust myself to do as good a job as The Mathworks. There’s bound to be gotchas and I am bound to not know them at first. What would you rather use, binpodf.m from Mathworks or binopdf.m from ‘Dodgy Sysadmin’s Functions Inc.’? Your research depends upon this function remember…
NAG to the rescue!
I wanted to use a version of this function that was written by someone I trust so my mind immediately turned to NAG (Numerical Algorithms Group) and their MATLAB toolbox which my employer has a full site license for. The NAG Toolbox for MATLAB has a function called g01bj which does what we need and a whole lot more. The basic syntax is
[plek, pgtk, peqk, ifail] = g01bj(n, p, x)
From the NAG documentation: Let X denote a random variable having a binomial distribution with parameters n and p. g01bj calculates the following probabilities
- plek = Prob{X<=x}
- pgtk = Prob{X > x}
- peqk = Prob{X = x}
MATLAB’s binopodf only calculates Prob(X=x) so the practical upshot is that for the following MATLAB code
n=200; x=0; p=0.02; y=binopdf(x,n,p)
the corresponding NAG Toolbox code (on a 32bit machine) is
n=int32(200); x=int32(0); p=0.02; [plek, pgtk, y, ifail] = g01bj(n, p, x);
both of these code snippets gives
y =
0.0176
Further tests show that NAG’s and MATLAB’s results agree either exactly or to within what is essentially numerical noise for a range of values. The only thing that remained was to make NAG’s function look more ‘normal’ for the user in question which meant creating a file called nag_binopdf.m which contained the following
function pbin = nag_binopdf(x,n,p) x=int32(x); n=int32(n); [~,~,pbin,~] = g01bj(n,p,x); end
Now the user had a function called nag_binopdf that behaved just like the Statistics toolbox’s binopdf, for his particular application, argument order and everything.
It’s not all plain sailing though!
As good as the NAG toolbox is, it’s not perfect. Note that I placed particular emphasis on the fact that we had come up with a suitable, drop-in replacement for binopdf for this user’s particular application. The salient points of his application are:
- His input argument, x, is just a single value – not a vector of values. The NAG library wasn’t written with MATLAB in mind, it’s written in Fortran, and so many of its functions are not vectorised (some of them are though). In Fortran this is no problem at all but in MATLAB it’s a pain. I could, of course, write a nag_binopodf that looks like it’s vectorised by putting g01bj in a loop hidden away from the user but I’d soon get found out because the performance would suck. To better support MATLAB users, NAG needs to write properly vectorised versions of its functions.
- He only wanted to run his code on 32bit machines. If he had wanted to run it on a 64bit machine then he would have to change all of those int32() calls to int64(). I agree that this is no big deal but it’s something that he wouldn’t have to worry about if he had coughed up the cash for a Mathworks statistics license.
Final thoughts
I use the NAG toolbox for MATLAB to do this sort of thing all the time and have saved researchers at my university several thousand pounds over the last year or so through unneeded Mathworks toolbox licenses. As well as statistics, the NAG toolbox is also good for optimisation (local and global), curve fitting, splines, wavelets, finance,partial differential equations and probably more. Not everyone likes this approach and it is not always suitable but when it works, it works well. Sometimes, you even get a significant speed boost into the bargain (check out my interp1 article for example).
Comments are, as always, welcomed.
Blog posts similar to this one
A friend of mine got me interested in JavaFX recently and my interest grew when I discovered that it had some nice charting functionality. Dean Iverson has written some great tutorials on the subject over at his blog and includes a link to a demo showing some of the different plot types that are available.
The demo is called ChartDemo and can be found here
http://pleasingsoftware.blogspot.com/2009/06/this-is-test.html
In an ideal world you simply have to click on the demo’s screenshot for it to download and launch but that wasn’t what happened for me. What happened when I clicked on it was nothing. No error messages…just nothing. It’s difficult to google ‘nothing happened’ and get something useful so I downloaded the demo (which had the filename ChartDemo.jnlp) and tried to launch it from the command line using
javaws ChartDemo.jnlp
This gave me the error message
netx: Unexpected net.sourceforge.jnlp.ParseException: Invalid XML document syntax. at net.sourceforge.jnlp.Parser.getRootNode(Parser.java:1200)
What follows is the story of how I eventually got this demo to work in the hope that it will help someone out there.
So, first things first, what are some of the relevant system specs I am using? Well, I am running 32bit Ubuntu Linux 9.10 (Karmic Koala) and
java -version
gives
java version "1.6.0_0" OpenJDK Runtime Environment (IcedTea6 1.6.1) (6b16-1.6.1-3ubuntu1) OpenJDK Server VM (build 14.0-b16, mixed mode)
Now, when I googled the error message I discovered that Linux (more specifically, I guess, the OpenJDK) is much more sensitive to xml errors than Windows/Mac OS X (.jnlp files are written in xml). Take double quotes for example; according to the W3C XML recommendations you should not use \” inside an xml attribute but should use “"” instead. Some java implementations don’t seem to care but, at the time of writing at least, OpenJDK definitely does. Follow this link to see the original discussion thread where I learned this.
The practical upshot of this extra level of strictness is that .jnlp files that work just fine on Windows and Mac OS X won’t work on Linux and I guessed that was what as happening here. Sadly there were no examples of \” in ChartDemo.jnlp for me to change to “"” so there must be something else ‘wrong’ with it; but what?
I decided to try the ‘stare at it until you figure it out’ approach to debugging and left the laptop on the side of the sofa while watching a movie on TV. About halfway through the movie, inspiration struck and I changed the line
<update check="background">
to
<update check="background"/>
which got things past the xml parsing stage. Sadly, I then hit another problem. Rather than a working ChartDemo, my efforts were rewarded with nothing more than just a blank window and a load of java errors in the terminal. When I say ‘a load’ I mean HUNDREDS and none of them looked particularly illuminating. I was starting to remember why I had avoided Java in the past but was not about to give up so easily.
Let’s take stock:
- The .nlbp file was fine (or at least didn’t return any parse errors)
- The ChartDemo code must be bug free because if it wasn’t then the author would have been told so rather quickly in the comments section of his blog
- My Java setup was presumably fine since I was able to run other JavaFX examples. For example I successfully worked through a JavaFX programming tutorial on Sun’s website without incident.
Of those three points I figured that the third one was the most likely to be wrong. It was OpenJDK’s handling of the .jnlp file that caused my first problem so maybe it was causing this second problem too. Could I switch from using OpenJDK to a different version of Java I wondered? Some googling ensued and I discovered some useful incantations.
I can list the versions of Java installed on my machine with the command
sudo update-java-alternatives -l
to get
java-6-openjdk 1061 /usr/lib/jvm/java-6-openjdk java-6-sun 63 /usr/lib/jvm/java-6-sun
I can change from the openjdk to sun-java with
sudo update-java-alternatives -s java-6-sun
Once I did this I tried to run the ChartDemo.nlbp file again:
javaws ChartDemo.jnlp
and it worked perfectly. I was rewarded with a very nice demo of JavaFX’s charting functionality and Dean’s tutorials proved to be very useful to me. So useful in fact that I bought his book.
Incidentially, the java-6-sun version of java doesn’t care about the syntax of the .jnlp file quite so much as openjdk. However, if you want to change back to using openjdk you can do
sudo update-java-alternatives -s java-6-openjdk
I hope this little tale helps someone out there. Let me know if it does and also feel free to let me know if I have got anything wrong. My knowledge of all things Java is rather basic at the moment to say the least – something I am trying to change.
Ever since version 6 of Mathematica was released (way back in 2007) I have received dozens of queries from users at my University saying something along the lines of ‘Plotting no longer works in Mathematica’ or ‘If I wrap my plot in a Do loop then I get no output. The user usually goes on to say that it worked perfectly well in pre-v6 versions of Mathematica, is clearly a bug and could I please file a bug report with Wolfram Research?
It’s not a bug….it’s a feature!
If you have googled your way here and just want a fast solution to your problem then here goes. I assume that you have code that looks like this
Do[
Plot[Cos[n x], {x, -Pi, Pi}]
, {n, 1, 3}
]
and you were expecting to get a set of nice plots (in this case 3) – one plot for each iteration of your loop. Instead, Mathematica 6 or above is giving you nothing…nada…zip! There are several ways you can ‘fix’ this and I am going to show you two. Option 1 is to wrap your Plot Command with Print
Do[
Print[Plot[Cos[n x], {x, -Pi, Pi}]]
, {n, 1, 3}
]
Option 2 is to use Table instead of Do
Table[
Plot[Cos[n x], {x, -Pi, Pi}]
, {n, 1, 3}
]
I am imaging that the average googler has now disappeared having got what they came for but some of you are possibly wondering why the above two ‘fixes’ work. Here’s my attempt at an explanation.
Old versions of Mathematica (v5.2 and below) treated graphics very differently to the way that Mathematica works today. The output of a command such as Plot[] used to contain two parts – the first was a Graphics object which looked a bit like the following in the notebook front end
– Graphics –
This was the OutputForm of the actual symbolic result of the Plot[] command. The second part – the actual visualization of the plot – was effectively just a side effect that happened to be what us humans actually wanted.
In version 6 and above, the OutputForm of a Graphics object is its visualisation which makes a lot more sense then just – Graphics –. The practical upshot of this is that there is now no need for a ‘side effect’ since the visualization of the plot IS the output. If you are confused then this bit is explained much more eloquently in an old Wolfram Blog post.
That’s the first part of the story regarding plots in loops. The second piece of the puzzle is to know that when you wrap a Mathematica expression in a Do loop then you suppress its output but you don’t suppress any side effects. In version 5.2 the plot is a side effect and so it gets displayed on each iteration of the loop. In version 6 and above, the plot is the output and so gets suppressed unless you explicitly ask for it to be displayed using Print[].
Does this help clear things up? Comments welcomed.
