Walking Randomly

Playing with NAG’s Library for SMP & multicore (Or parallel programming made easy)

April 20th, 2010 | Categories: Fortran, math software, NAG Library, parallel programming, programming | Tags:

I work for the University of Manchester in the UK as a ‘Science and Engineering Applications specialist’ which basically means that I am obsessed with software used by mathematicians and scientists. One of the applications within my portfolio is the NAG library – a product that we use rather a lot in its various incarnations. We have people using it from Fortran, C, C++, MATLAB, Python and even Visual Basic in areas as diverse as engineering, applied maths, biology and economics.

Yes, we are big users of NAG at Manchester but then that stands to reason because NAG and Manchester have a collaborative history stretching back 40 years to NAG’s very inception. Manchester takes a lot of NAG’s products but for reasons that are lost in the mists of time, we have never (to my knowledge at least) had a site license for their SMP library (more recently called The NAG Library for SMP & multicore). Very recently, that changed!

SMP stands for Symmetric Multi-Processor which essentially means ‘two or more CPUs sharing the same bit of memory.’ Five years ago, it would have been rare for a desktop user to have an SMP machine but these days they are everywhere. Got a dual-core laptop or a quad-core desktop? If the answer’s ‘yes’ then you have an SMP machine and you are probably wondering how to get the most out of it.

‘How can I use all my cores (without any effort)’

One of the most common newbie questions I get asked these days goes along the lines of ‘Program X is only using 50%/25%/12.5% of my CPU – how can I fix this?’ and, of course, the reason for this is that the program in question is only using a single core of their multicore machine. So, the problem is easy enough to explain but not so easy to fix because it invariably involves telling the user that they are going to have to learn how to program in parallel.

Explicit parallel programming is a funny thing in that sometimes it is trivial and other times it is pretty much impossible – it all depends on the problem you see. Sometimes all you need to do is drop a few OpenMP pragmas here and there and you’ll get a 2-4 times speed up. Other times you need to completely rewrite your algorithm from the ground up to get even a modest speed up. Yet more times you are simply stuck because your problem is inherently non-parallelisable. It is even possible to slow your code down by trying to parallelize it!

If you are lucky, however, then you can make use of all those extra cores with no extra effort at all! Slowly but surely, mathematical software vendors have been re-writing some of their functions to ensure that they work efficiently on parallel processors in such a way that it is transparent to the user. This is sometimes referred to as implicit parallelism.

Take MATLAB, for example, ever since release 2007a more and more built in MATLAB functions have been modified to allow them to make full use of multi-processor systems. If your program makes extensive use of these functions then you don’t need to spend extra money or time on the parallel computing toolbox, just run your code on the latest version of MATLAB, sit back and enjoy the speed boost. It doesn’t work for all functions of course but it does for an ever increasing subset.

The NAG SMP Library – zero effort parallel programming

For users of NAG routines, the zero-effort approach to making better use of your multicore system is to use their SMP library. According to NAG’s advertising blurb you don’t need to rewrite your code to get a speed boost – you just need to link to the SMP library instead of the Fortran library at compile time.

Just like MATLAB, you won’t get this speed boost for every function, but you will get it for a significant subset of the library (around 300+ functions as of Mark 22 – the full list is available on NAG’s website). Also, just like MATLAB, sometimes the speed-up will be massive and other times it will be much more modest.

I wanted to test NAG’s claims for myself on the kind of calculations that researchers at Manchester tend to perform so I asked NAG for a trial of the upcoming Mark 22 of the SMP library and, since I am lazy, I also asked them to provide me with simple demonstration programs and I’d like to share the results with you all. These are not meant to be an exhaustive set of benchmarks (I don’t have the time to do such a thing) but should give you an idea of what you can expect from NAG’s new library.

System specs

All of these timings were made on a 3Ghz Intel Core2 Quad Q9650 CPU desktop machine with 8Gb of RAM running Ubuntu 9.10. The serial version of the NAG library used was fll6a22dfl and the SMP version of the library was fsl6a22dfl. The version of gfortran used was 4.4.1 and the cpu-frequency governor was switched off as per a previous blog post of mine.

Example 1 – Nearest correlation matrix

The first routine I looked at was one that calculated the nearest correlation matrix. In other words ‘Given a symmetric matrix, what is the nearest correlation matrix—thatis, the nearest symmetric positive semidefinite matrix withunit diagonal?’^[1] This is a problem that pops up a lot in Finance – option pricing, risk management – that sort of thing.

The NAG routine that calculates the nearest correlation matrix is G02AAF which is based on the algorithm developed by Qi and Sun^[2]. The example program I was sent first constructs a N x N tridiagonal matrix A of the form A(j,j)=2, A(j-1,j)=-1.1 and A(j,j-1)=1.0. It then times how long it takes G02AAF to find the nearest correlation matrix to A. You can download this example, along with all supporting files, from the links below.

To compile this benchmark against the serial library I did

gfortran ./bench_g02aaf.f90 shtim.c wltime.f /opt/NAG/fll6a22dfl/lib/libnag_nag.a  \
/opt/NAG/fll6a22dfl/acml/libacml_mv.a -o serial_g02aaf

To compile the parallel version I did

gfortran -fopenmp ./bench_g02aaf.f90 shtim.c wltime.f /opt/NAG/fsl6a22dfl/lib/libnagsmp.a \
  /opt/NAG/fsl6a22dfl/acml4.3.0/lib/libacml_mp.a -o smp_g02aaf

I didn’t need to modify the source code in any way when going from a serial version to a parallel version of this benchmark. The only work required was to link to the SMP library instead of the serial library – so far so good. Let’s run the two versions and see the speed differences.

First things first, let’s ensure that there is no limit to the stack size of my shell by doing the following in bash

ulimit -s unlimited

Also, for the parallel version, I need to set the number of threads to equal the number of processors I have on my machine by setting the OMP_NUM_THREADS environment variable.

export OMP_NUM_THREADS=4

This won’t affect the serial version at all. So, onto the program itself. When you run it, it will ask you for two inputs – an integer, N, and a boolean, IO. N gives the size of the starting matrix and IO determines whether or not to output the result.

Here’s an example run of the serial version with N=1000 and IO set to False. (In Fortran True is represented as .t. and False is represented as .f. )

./serial_g02aaf
 G02AAF: Enter N, IO
1000 .f.
 G02AAF: Time, IFAIL =    37.138997077941895                0
 G02AAF: ITER, FEVAL, NRMGRD =            4           5  4.31049822255544465E-012

The only output I am really interested in here is the time and 37.13 seconds doesn’t seem too bad for a 1000 x 1000 matrix at first glance. Move to the parallel version though and you get a very nice surprise

./smp_g02aaf
 G02AAF: Enter N, IO
1000 .f.
 G02AAF: Time, IFAIL =    5.1906139850616455                0
 G02AAF: ITER, FEVAL, NRMGRD =            4           5  4.30898281428799420E-012

The above times were typical and varied little from run to run (although the SMP version varied by a bigger percentage than the serial version). However I averaged over 10 runs to make sure and got 37.14 s for the serial version and 4.88 s for the SMP version which gives a speedup of 7.6 times!

Now, this is rather impressive. Usually, when one parallelises over N cores then the very best you can expect in an ideal word is a speed up of just less than N times, so called ‘linear scaling’. Here we have N=4 and a speedup of 7.6 implying that NAG have achieved ‘super-linear scaling’ which is usually pretty rare.

I dropped them a line to ask what was going on. It turns out that when they looked at parallelising this routine they worked out a way of speeding up the serial version as well. This serial speedup will be included in the serial library in its next release, Mark 23 but the SMP library got it as of Mark 22.

So, some of that 7.6 times speedup is as a result of serial speedup and the rest is parallelisation. By setting OMP_NUM_THREADS to 1 we can force the SMP version of the benchmark to only run on only one core and thus see how much of the speedup we can attribute to parallelisation:

export OMP_NUM_THREADS=1
./smp_g02aaf
 G02AAF: Enter N, IO
1000 .f.
 G02AAF: Time, IFAIL =    12.714214086532593                0
 G02AAF: ITER, FEVAL, NRMGRD =            4           5  4.31152424170390294E-012

Recall that the 4 core version took an average of 4.88 seconds so the speedup from parallelisation alone is 2.6 times – much closer to what I expected to see. Now, it is probably worth mentioning that there is an extra level of complication (with parallel programming there is always an extra level of complication) in that some of this parallelisation speedup comes from extra work that NAG has done in their algorithm and some of it comes from the fact that they are linking to parallel versions of the BLAS and LAPACK libraries. We could go one step further and determine how much of the speed up comes from NAG’s work and how much comes from using parallel BLAS/LAPACK but, well, life is short.

The practical upshot is that if you come straight from the Mark 22 serial version then you can expect a speed-up of around 7.6 times. In the future, when you compare the Mark 22 SMP library to the Mark 23 serial library then you can expect a speedup of around 2.6 times on a 4 core machine like mine.

Example 2 – Quasi random number generation

Quasi random numbers (also referred to as ‘Low discrepancy sequences’) are extensively used in Monte-Carlo simulations which have applications in areas such as finance, chemistry and computational physics. When people need a set of quasi random numbers they usually need a LOT of them and so the faster they can be produced the better. The NAG library contains several quasi random number generators but the one considered here is the routine g05ymf. The benchmark program I used is called bench_g05ymf.f90 and the full set of files needed to compile it are available at the end of this section.

The benchmark program requires the user to input 4 numbers and a boolean as follows.

The first number is the quasi random number generator to use with the options being:
1. NAG’s newer Sobol generator (based on the 2008 Joe and Kuo algorithm^[3] )
2. NAG’s older Sobol generator
3. NAG’s Niederreiter generator
4. NAG’s Faure generator
The second number is the order in which the generated values are returned (The parameter RCORD as referred to in the documentation for g05ymf). Say that the matrix of returned values is called QUAS then if RCORD=1, QUAS(i,j) holds the jth value for the ith dimension, otherwise QUAS(i,j) holds the ith value for the jth dimension.
The third number is the number of dimensions required.
The fourth number is the number of number of quasi-random numbers required.
The boolean (either .t. or .f.) determines whether or not the output should be verbose or not. A value of .t. will output the first 100 numbers in the sequence.

To compile the benchmark program against the serial library I did:

gfortran ./bench_g05ymf.f90 shtim.c wltime.f /opt/NAG/fll6a22dfl/lib/libnag_nag.a  \
/opt/NAG/fll6a22dfl/acml/libacml_mv.a -o serial_g02aaf

As before, the only thing needed to turn this into a parallel program was to compile against the SMP library and add the -fopenmp switch

gfortran -fopenmp ./bench_g05ymf.f90 shtim.c wltime.f /opt/NAG/fsl6a22dfl/lib/libnagsmp.a  \
/opt/NAG/fsl6a22dfl/acml4.3.0/lib/libacml_mp.a -o smp_g02aaf

The first set of parameters I used was

1 2 900 1000000 .f.

Which uses NAG’s new Sobol generator with RCORD=2 to generate and store 1,000,000 numbers over 900 dimensions with no verbose output. Averaged over 10 runs the times were 5.17 seconds for the serial version and 2.14 seconds for the parallel version giving a speedup of 2.4x on a quad-core machine. I couldn’t push number of dimensions much higher because the benchmark stores all of the numbers in one large array and I was starting to run out of memory.

If you only want a relatively small sequence then switching to the SMP library is actually slower thanks to the overhead involved in spawning extra threads. For example if you only want 100,000 numbers over 8 dimensions:

1 2 8 100000 .f.

then the serial version of the code takes an average of 0.0048 seconds compared to 0.0479 seconds for the parallel version so the parallel version is almost 10 times slower when using 4 threads for small problems. Setting OMP_NUM_THREADS to 1 gives exactly the same speed as the serial version as you might expect.

NAG have clearly optimized this function to ensure that you get a good speedup for large problem sets which is where it really matters so I am very pleased with it. However, the degradation in performance for smaller problems concerns me a little. I think I’d prefer it if NAG were to modify this function so that it works serially for small sequences and to automatically switch to parallel execution if the user requests a large sequence.

Conclusions

In the old days we could speedup our programs simply by buying a new computer. The new computer would have a faster clock speed than the old one and so our code would run faster with close to zero effort on our part. Thanks to the fact that clock speeds have stayed more or less constant for the last few years those days are over. Now, when we buy a new computer we get more processors rather than faster ones and this requires a change in thinking. Products such as the NAG Library for SMP & multicore help us to to get the maximum benefit out of our machines with the minimum amount of effort on our part. If switching to a product like this doesn’t give you the performance boost you need then the next thing for you to do is to learn how to program in parallel. The free ride is over.

In summary:

You don’t need to modify your code if it already uses NAG’s serial library. Just recompile against the new library and away you go. You don’t need to know anything about parallel programming to make use of this product.
The SMP Library works best with big problems. Small problems don’t work so well because of the inherent overheads of parallelisation.
On a quad-core machine you can expect speed-ups around 2-4 times compared to the serial library. In exceptional circumstances you can expect speed-up as large as 7 times or more.
You should notice a speed increase in over 300 functions compared to the serial library.
Some of this speed increase comes from fundamental libraries such as BLAS and LAPACK, some of it comes from NAG directly parallelising their algorithms and yet more comes from improvements to the underlying serial code.
I’m hoping that this Fortran version is just the beginning. I’ve always felt that NAG program in Fortran so I don’t have to and I’m hoping that they will go on to incorporate their SMP improvements in their other products, especially their C library and MATLAB toolbox.

Acknowledgements

Thanks to several staff at NAG who suffered my seemingly endless stream of emails during the writing of this article. Ed, in particular, has the patience of a saint. Any mistakes left over are all mine!

Links

The Nearest Correlation Matrix benchmark program used here – bench_g02aaf.f90 – along with the source code for the timing routine.
NAG’s documentation for the g02aaf routine.
The Quasi Random Number benchmark program used here – bench_g05ymf.f90 – along with the source code for the timing routine.
An Introduction to quasi random numbers (written by a member of NAG’s staff)
Official NAG documentation for g05ymf – the quasi random number generator considered here.

References

[1] – Higham N, Computing the nearest correlation matrix—a problem from finance, IMA Journal of Numerical Analysis 22 (3): 329–343

[2] – Qi H and Sun D (2006), A Quadratically Convergent Newton Method for Computing the Nearest
Correlation Matrix, SIAM J. Matrix AnalAppl 29(2) 360–385

[3] – Joe S and Kuo F Y (2008) Constructing Sobol sequences with better two-dimensional projects SIAM J. Sci. Comput. 30 2635–2654

Dislaimer: This is a personal blog and the opinions expressed on it are mine and do not necessarily reflect the policies and opinions of my employer, The University of Manchester.

2 comments

Slow multicore performance on Ubuntu Linux -and how to fix it

April 19th, 2010 | Categories: Linux, parallel programming, programming | Tags:

I was recently playing with some parallel code that used gfortran and OpenMP to parallelise a calculation over 4 threads and was getting some seriously odd timing results. The first time I ran the code it completed in 4.8 seconds, the next run took 4.8 seconds but the third run took 84 seconds (No, I haven’t missed off the decimal point). Subsequent timings were all over the place – 12 seconds, 4.8 seconds again, 14 seconds….something weird was going on.

I wouldn’t mind but all of these calculations had exactly the same input/output and yet sometimes this parallel code took significantly longer to execute than the serial version.

On drilling down I discovered that two of the threads had 100% CPU utilisation and the other two only had 50%. On a hunch I wondered if the CPU frequency ‘on demand’ scaling thing was messing things (It has done so in the past) up so I did

sudo cpufreq-selector -c 0 -f 3000
sudo cpufreq-selector -c 1 -f 3000
sudo cpufreq-selector -c 2 -f 3000
sudo cpufreq-selector -c 3 -f 3000

to set the cpu-frequency of all 4 cores to the maxium 3Ghz. This switched off the ‘on demand’ setting that is standard in Ubuntu.

Lo! it worked! 4.8 seconds every time. When I turned the governor back on

sudo cpufreq-selector --governor ondemand

I got back the ‘sometimes it’s fast, sometimes it’s slow’ behaviour. Oddly, one week and several reboots later I can’t get back the slow behaviour no matter what I set the governor to.

Perhaps this was just a temporary glitch in my system but, as I said earlier, I have seen this sort of behaviour before so, just to be on the safe side, it might be worth switching off the automatic governor whenever you do parallel calculations in Linux.

Does anyone have any insight into this? Comments welcomed.

2 comments

Mathcad Prime Screenshots

April 14th, 2010 | Categories: math software, mathcad | Tags:

I’ve been following the progress of Mathcad Prime, the upcoming replacement for PTC’s Mathcad, for a few months now and have managed to get my hands on a few preview screen shots. Click on each image to obtain a full resolution version

I’m particularly pleased to see that final image since it demonstrates that Mathcad Prime will support programming constructs. There were rumours a while ago that Prime would not be capable of either symbolics or programming which concerned me somewhat because I support users that require both.

A couple of weeks ago, PTC arranged a virtual conference where a couple of PTC developers demonstrated the latest build of Mathcad Prime. The focus was very much on Prime’s ease of use and its ability to handle units and I have to say I was impressed – it seems to be a big improvement over Mathcad 14. As long as symbolics makes it into Prime then I think I am going to be rather happy with it.

It is still possible to check out the recordings of the virtual conference yourself at http://events.unisfair.com/index.jsp?eid=522&seid=25 (requires registration). If you’ve seen the presentation then I’d love to know what you think so drop me a comment and say hi.

7 comments

Walking Randomly through the web #3

April 13th, 2010 | Categories: walking randomly | Tags:

A shameless copy of The Endeavour’s weekend miscellany

Programming

A Mathematica package for accessing Flikr – doing some cool stuff with Mathematica’s image processing commands.

Food

What looks like a strawberry, but is white and tastes like a pineapple? – Tasty and weird looking. What more do you want from your food?
Beef stew – recipe from TV chef Jamie Oliver. I’ve cooked this loads of times now and it never lets me down.

Technology

First Ipad Reviews are in – Makes me want one more

Life

Philip Pullman’s Riposte – Damn right!
Simon Singh wins libel court battle – Common sense prevails at last.
Animals living without oxygen found – Proving that life is tougher than we thought.

3 comments

Polygonal numbers on quadratic number spirals – MATLAB version

April 9th, 2010 | Categories: general math, matlab | Tags:

After reading my recent post, Polygonal numbers on quadratic number spirals, Walking Randomly reader Matt Tearle came up with a MATLAB version and it looks great. Click on the image below for the code.

1 comment

Carnival of Math #64 – something a little different

April 5th, 2010 | Categories: Carnival of Math, general math | Tags:

The 64th Carnival of mathematics has been posted over at Maria’s Teaching College Math blog and it is rather different from the norm. Posted as a mind map, it includes articles on a Murder Mystery project for logarithms, weightlifting for the brain, MATLAB, R, the mathematics behind a good night’s sleep and much more.

If you are new to math carnivals and are wondering what is going on then check out my introductory post here.

Other recent carnivals include:

Math Teachers at Play #24 – posted at Let’s play Math
Carnival of maths #63 – posted at Mathrecreation
Math Teachers at Play #23 – posted at Mathrecreation
Carnival of maths #62 – posted at The Endeavour

0 comments

Free mathcad alternative, SMath Studio, updated. Now with Units!

April 3rd, 2010 | Categories: Free software, math software, mathcad | Tags:

Andrey Ivashov’s free Mathcad alternative for Windows, Linux and Windows Mobile has recently been updated to version 0.87.3728 beta. This is a significant new release for SMath studio since this beta version adds support for units – possibly one of the most requested features on Andrey’s user forums.

I last looked at SMath Studio back in September 2009 and Andrey has made a lot of progress since then. As well as adding support for units and fixing some bugs he has added a whole host of user interface improvements including

Dynamic assistance – SMath Studio can now be set to give you syntax hints and tips as you type your code.
Support for plugins – Now we can all write something for SMath Studio
A much smoother looking user interface – SMath Studio now looks much more professional than before

Smath studio on Linux

This fantastic free alternative to Mathcad is going from strength to strength and I am really enjoying watching its progress. Andrey has produced something that is genuinely useful, multiplatform, fun to use and, most importantly of all, he listens to his users. Well worth a download.

8 comments

Polygonal numbers on quadratic number spirals

March 31st, 2010 | Categories: general math, mathematica, python | Tags:

Over at Mathrecreation, Dan Mackinnon has been plotting two dimensional polygonal numbers on quadratic number spirals using a piece of software called Fathom. Today’s train-journey home project for me was to reproduce this work using Mathematica and Python.

I’ll leave the mathematics to Mathrecreation and Numberspiral.com and just get straight to the code. Here’s the Mathematica version:

Download the Mathematica Notebook
Download a fully interactive version that will run on the free Mathematica Player

p[k_, n_] := ((k - 2) n (n + 1))/2 - (k - 3) n;
Manipulate[
 nums = Table[p[k, n], {n, 1, numpoints}];
 If[showpoints,
  ListPolarPlot[Table[{2*Pi*Sqrt[n], Sqrt[n]}, {n, nums}]
   , Joined -> joinpoints, Axes -> False,
   PlotMarkers -> {Automatic, Small}, ImageSize -> {400, 400}],
  ListPolarPlot[Table[{2*Pi*Sqrt[n], Sqrt[n]}, {n, nums}]
   , Joined -> joinpoints, Axes -> False, ImageSize -> {400, 400}
   ]
  ]
 , {k, 2, 50, Appearance -> "Labeled"}
 , {{numpoints, 100, "Number of points"}, {100, 250, 500}}
 , {{showpoints, True, "Show points"}, {True, False}}
 , {{joinpoints, True, "Join points"}, {True, False}}
 , SaveDefinitions :> True
 ]

A set of 4 screenshots is shown below (click the image for a higher resolution version). The formula for polygonal numbers, p(kn,n), is only really valid for integer k and so I made a mistake when I allowed k to be any real value in the code above. However, I liked some of the resulting spirals so much that I decided to leave the mistake in.

When Dan showed the images he produced from Fathom, he mentioned that he would one day like to implement these systems in a more open platform. Well, you can’t get much more open than Python so here’s a piece of code that makes use of Python and the matplotlib package (download the file – polygonal_spiral.py).

#!/usr/bin/env python
import matplotlib.pyplot as plt
from math import *
 
def p(k,n):
   return(((k-2)*n*(n+1))/2 -(k-3)*n)
 
k=13
 
polygonal_nums = [p(k,n) for n in range(100)]
theta = [2*pi*sqrt(n) for n in polygonal_nums]
r = [sqrt(n) for n in polygonal_nums]
 
myplot = plt.polar(theta,r,'.-')
plt.gca().axis('off')
plt.show()

Now, something that I haven’t figured out yet is why the above python code works just fine for 100 points but if you increase it to 101 then you just get a single point. Does anyone have any ideas?

Polygonal numbers on a number spiral

Unlike my Mathematica version, the above Python program does not give you an interactive GUI but I’ll leave that as an exercise for the reader.

If you enjoyed this post then you might also like:

11 comments

Nonlinear model fitting in Mathematica: The quest for more speed

March 29th, 2010 | Categories: math software, mathematica, programming | Tags:

A Mathematica user recently asked me if I could help him speed up the following Mathematica code. He has some data and he wants to model it using the following model

dFIO2 = 0.8;
PBH2O = 732;
dCVO2[t_] := 0.017 Exp[-4 Exp[-3 t]];
dPWO2[t_?NumericQ, v_?NumericQ, qL_?NumericQ, q_?NumericQ] := 
 PBH2O*((v/(v + qL))*dFIO2*(1 - E^((-(v + qL))*t)) + 
    q*NIntegrate[E^((v + qL)*(\[Tau] - t))*dCVO2[\[Tau]], {\[Tau], 0, t}])

Before starting to apply this to his actual dataset, he wanted to check out the performance of the NonlinearModelFit[] command. So, he created a completely noise free dataset directly from the model itself

data = Table[{t, dPWO2[t, 33, 25, 55]}, {t, 0, 6, 6./75}];

and then, pretending that he didn’t know the parameters that formed that dataset, he asked NonlinearModelFit[] to find them:

fittedModel = NonlinearModelFit[data, dPWO2[t, v, qL, q], {v, qL, q}, t]; // Timing
params = fittedModel["BestFitParameters"]

On my machine this calculation completes in 11.21 seconds and gives the result we expect:

{11.21, Null} 
{v -> 33., qL-> 25., q -> 55.}

11.21 seconds is too slow though since he wants to evaluate this on real data many hundreds of times. How can we make it any faster?

I started off by trying to make NIntegrate go faster since this will be evaluated potentially hundreds of times by the optimizer. Turning off Symbolic pre-processing did the trick nicely in this case. To turn off Symbolic pre-processing you just add

Method -> {Automatic, "SymbolicProcessing" -> 0}

to the end of the NIntegrate command. So, we re-define his model function as

dPWO2[t_?NumericQ, v_?NumericQ, qL_?NumericQ, q_?NumericQ] := 
 PBH2O*((v/(v + qL))*dFIO2*(1 - E^((-(v + qL))*t)) + 
    q*NIntegrate[E^((v + qL)*(\[Tau] - t))*dCVO2[\[Tau]], {\[Tau], 0, t}
,Method -> {Automatic, "SymbolicProcessing" -> 0}])

This speeds things up almost by a factor of 2.

fittedModel = NonlinearModelFit[data, dPWO2[t, v, qL, q], {v, qL, q}, t]; // Timing
params = fittedModel["BestFitParameters"]

{6.13, Null}
{v -> 33., qL -> 25., q -> 55.}

Not bad but can we do better?

Giving Mathematica a helping hand.

Like most optimizers, NonlinearModelFit[] will work a lot faster if you give it a decent starting guess. For example if we provide a close-ish guess at the starting parameters then we get a little more speed

fittedModel = 
   NonlinearModelFit[data, 
    dPWO2[t, v, qL, q], {{v, 25}, {qL, 20}, {q, 40}}, 
    t]; // Timing
params = fittedModel["BestFitParameters"]

{5.78, Null}
{v -> 33., qL -> 25., q -> 55.}

give an even better starting guess and we get yet more speed

fittedModel = 
   NonlinearModelFit[data, 
    dPWO2[t, v, qL, q], {{v, 31}, {qL, 24}, {q, 54}}, 
    t]; // Timing
params = fittedModel["BestFitParameters"]

{4.27, Null}
{v -> 33., qL -> 25., q -> 55.}

Ask the audience

So, that’s where we are up to so far. Neither of us have much experience with this particular part of Mathematica so we are hoping that someone out there can speed this calculation up even further. Comments are welcomed.

3 comments

Can you guarantee that you’ll always have access to your software?

March 24th, 2010 | Categories: math software | Tags:

I work for a large university in the UK and part of my job involves helping to look after our site licenses for software such as MATLAB, Origin, Labview, Mathematica,Abaqus and the NAG Library among many others. Now, anyone who works in this field will know that no two site licenses are alike. For example, one license might allow any member of the university to install it on any machine that they own whereas others are limited only to university owned machines. There are lots of little rules like this and I could go on for quite a while about some of the gotchas but I’d prefer to focus this post on something close to my heart right now – the matter of perpetual usage.

Imagine, if you will, that you are the coordinator for a degree course and you are completely overhauling the syllabus. Part of this overhaul will concern the software that you use and teach as part of this course. You want to standardize on just one package so that students won’t have to learn a new system every semester. After reviewing all of the possibilities, you finally decide that it is going to be either software A or software B. Both are powerful, easy to use, considered an industry standard etc etc. As far as functionality goes there really isn’t much between them. Which one to choose?

You invite your friendly,neighborhood software geek out for coffee and ask him to talk about the two software packages. He likes talking about this sort of stuff and usually has a lot to say. He talks about open-source alternatives, programming styles, speed, potentially useful books, other people in your field who use this type of software and so on. Eventually he starts talking about licensing and offers the following tidbit of information.

Our license for Software A includes perpetual use. This means that if our funding dried up we could stop paying our maintenance agreement but we would never lose access to the software. Of course we’d not be able to upgrade to the latest version any more but we’d always have access to the version we have right now.
Our license for Software B doesn’t include perpetual use. So, if we stopped paying our maintenance agreement then we can’t use the software anymore. We’d have to stop using it for teaching, research….everything. You’d have to go and buy your own license for it.

“How likely is it that we might stop funding either of them?” you ask.

“Oh, pretty unlikely,” comes the reply as he rests his coffee mug over that day’s newspaper headline “We are a big university and both of these packages are key to what we do. I’d say we are very unlikely to stop funding either of them.”

I have a question to you all. Would the lack of perpetual use for Software B be a factor in your decision concerning what you would use for your teaching and research?

5 comments

Walking Randomly

Playing with NAG’s Library for SMP & multicore (Or parallel programming made easy)

‘How can I use all my cores (without any effort)’

The NAG SMP Library – zero effort parallel programming

Slow multicore performance on Ubuntu Linux -and how to fix it

Mathcad Prime Screenshots

Walking Randomly through the web #3

Polygonal numbers on quadratic number spirals – MATLAB version

Carnival of Math #64 – something a little different

Free mathcad alternative, SMath Studio, updated. Now with Units!

Polygonal numbers on quadratic number spirals

Nonlinear model fitting in Mathematica: The quest for more speed

Giving Mathematica a helping hand.

Ask the audience

Can you guarantee that you’ll always have access to your software?

Categories

Archives

Blogroll

Meta

‘How can I use all my cores (without any effort)’

The NAG SMP Library – zero effort parallel programming

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Giving Mathematica a helping hand.

Ask the audience

Share this:

Share this:

Categories

Archives

Blogroll

Meta