October 7th, 2012 | Categories: Financial Math, Julia, just for fun, matlab | Tags:

I felt like playing with Julia and MATLAB this Sunday morning.  I found some code that prices European Options in MATLAB using Monte Carlo simulations over at computeraidedfinance.com and thought that I’d port this over to Julia.  Here’s the original MATLAB code

function V = bench_CPU_European(numPaths)
%Simple European
steps = 250;
r = (0.05);
sigma = (0.4);
T = (1);
dt = T/(steps);
K = (100);

S = 100 * ones(numPaths,1);

for i=1:steps
 rnd = randn(numPaths,1);
 S = S .* exp((r-0.5*sigma.^2)*dt + sigma*sqrt(dt)*rnd);
end
V = mean( exp(-r*T)*max(K-S,0) )

I ran this a couple of times to see what results I should be getting and how long it would take for 1 million paths:

tic;bench_CPU_European(1000000);toc
V =
   13.1596
Elapsed time is 6.035635 seconds.
>> tic;bench_CPU_European(1000000);toc
V =
   13.1258
Elapsed time is 5.924104 seconds.
>> tic;bench_CPU_European(1000000);toc
V =
   13.1479
Elapsed time is 5.936475 seconds.

The result varies because this is a stochastic process but we can see that it should be around 13.1 or so and takes around 6 seconds on my laptop. Since it’s Sunday morning, I am feeling lazy and have no intention of considering if this code is optimal or not right now. I’m just going to copy and paste it into a julia file and hack at the syntax until it becomes valid Julia code. The following seems to work

function bench_CPU_European(numPaths)

steps = 250
r = 0.05
sigma = .4;
T = 1;
dt = T/(steps)
K = 100;

S = 100 * ones(numPaths,1);

for i=1:steps
 rnd = randn(numPaths,1)
 S = S .* exp((r-0.5*sigma.^2)*dt + sigma*sqrt(dt)*rnd)
end
V = mean( exp(-r*T)*max(K-S,0) )
end

I ran this on Julia and got the following

julia> tic();bench_CPU_European(1000000);toc()
elapsed time: 36.259000062942505 seconds
36.259000062942505

julia> bench_CPU_European(1000000)
13.114855104505445

The Julia code appears to be valid, it gives the correct result of 13.1 ish but at 36.25 seconds is around 6 times slower than the MATLAB version.  The dog needs walking so I’m going to think about this another time but comments are welcome.

Update (9pm 7th October 2012):   I’ve just tried this Julia code on the Linux partition of the same laptop and 1 million paths took 14 seconds or so:

tic();bench_CPU_European(1000000);toc()
elapsed time: 14.146281957626343 seconds

I built this version of Julia from source and so it’s at the current bleeding edge (version 0.0.0+98589672.r65a1 Commit 65a1f3dedc (2012-10-07 06:40:18). The code is still slower than the MATLAB version but better than the older Windows build

Update: 13th October 2012

Over on the Julia mailing list, someone posted a faster version of this simulation in Julia

function bench_eu(numPaths)
    steps = 250
    r = 0.05
    sigma = .4;
    T = 1;
    dt = T/(steps)
    K = 100;

    S = 100 * ones(numPaths,1);

    t1 = (r-0.5*sigma.^2)*dt
    t2 = sigma*sqrt(dt)
    for i=1:steps
        for j=1:numPaths
            S[j] .*= exp(t1 + t2*randn())
        end
    end

    V = mean( exp(-r*T)*max(K-S,0) )
end

On the Linux partition of my test machine, this got through 1000000 paths in 8.53 seconds, very close to the MATLAB speed:

julia> tic();bench_eu(1000000);toc()
elapsed time: 8.534484148025513 seconds

It seems that, when using Julia, one needs to unlearn everything you’ve ever learned about vectorisation in MATLAB.

Update: 28th October 2012

Members of the Julia team have been improving the performance of the randn() function used in the above code (see here and here for details).  Using the de-vectorised code above, execution time for 1 million paths in Julia is now down to 7.2 seconds on my machine on Linux.  Still slower than the MATLAB 2012a implementation but it’s getting there.  This was using Julia version  0.0.0+100403134.r0999 Commit 099936aec6 (2012-10-28 05:24:40)

tic();bench_eu(1000000);toc()
elapsed time: 7.223690032958984 seconds
7.223690032958984
  • Laptop model: Dell XPS L702X
  • CPU:Intel Core i7-2630QM @2Ghz software overclockable to 2.9Ghz. 4 physical cores but total 8 virtual cores due to Hyperthreading.
  • RAM: 8 Gb
  • OS: Windows 7 Home Premium 64 bit and Ubuntu 12.04
  • MATLAB: 2012a
  • Julia: Original windows version was Version 0.0.0+94063912.r17f5, Commit 17f50ea4e0 (2012-08-15 22:30:58).  Several versions used on Linux since, see text for details.
October 4th, 2012 | Categories: Month of Math Software | Tags:

Welcome to the latest edition of A Month of Math Software where I take a look at all that is shiny and new in the computational mathematics world.  This one’s slightly late and so it not only covers all of September but also the first 3 days in October.  If you have any math software news that you’d like to share with the world, drop me a line and tell me all about it.  Enjoy!

MATLAB gets a Ribbon (sorry…Toolstrip)

A new version of MATLAB has been released and it has had some major cosmetic surgery.  The Mathworks insist on calling the new look in 2012b a Toolstrip but everyone else will call it a Ribbon.  Although they’ve been around for many years, ribbon based interfaces hit the big time when Microsoft used them for Office 2007..a decision that many, many, many, many, many, many, many people hated.  I hate them too and now I have to contend with one in MATLAB…and so do you because there is no way to switch back to the old interface.  The best you can do is minimise the thing and pretend it doesn’t exist.  Unhappy users abound (check out the user comments at http://blogs.mathworks.com/loren/2012/09/12/the-matlab-r2012b-desktop-part-1-introduction-to-the-toolstrip/ for example).  There have been a lot of other changes too which I’ll discuss in an upcoming review.

Do you use MATLAB? How do you feel about this new look?

MATLAB 2012b ribbon

Numerical Javascript!

Free and open source general purpose mathematics

Scilab 5.4

  • On 8th September, Sage version 5.3 was released.  Sage is an extremely powerful general purpose mathematics package based on Python and dozens of other open source projects.  The Sage development team like to say that instead of  re-inventing the wheel they built a car!  Mighty fine one too if you ask me.  What’s new in Sage 5.3
  • René Grothmann has updated his very nice, free Euler Math Toolbox.  At the time of writing its at version 18.8 but the updates come thick and fast.  The latest changes are always at http://euler.rene-grothmann.de/Programs/XX%20-%20Changes.html

The theory of numbers

  • Pari version 2.5.3 has been released. Pari is a free ‘computer algebra system designed for fast computations in number theory’
  • Magma version 2.18-10 was released in September.  Magma is a commercial system for algebra, number theory, algebraic geometry and algebraic combinatorics.

Numerical Libraries

  • The Intel Math Kernel Library (MKL) is now at version 11.0.  The MKL is a highly optimised numerical library for Intel platforms that covers subjects such as linear algebra, fast fourier transforms and random numbers.  Find out what’s new at http://software.intel.com/en-us/articles/whats-new-in-intel-mkl/
  • LAPACK, the standard library for linear algebra on which libraries such as MKL and ACML are based, has been updated to version 3.4.2.  There is no new functionality, this is a bug-fix release
  • The Numerical Algorithms Group (NAG) have released a major update to their commercial C library.  Mark 23 of the library includes lots of new stuff (345 new functions) such as a version of the Mersenne Twister random number generator with skip-ahead, additional functions for multidimensional integrals, a new suite of functions for solving boundary-value problems by an implementation of the Chebyshev pseudospectral method and loads more.  The press release is at http://www.nag.co.uk/numeric/CL/newatmark23 and the juicy detail is at http://www.nag.co.uk/numeric/CL/nagdoc_cl23/html/genint/news.html

Python

  • After the publication of the last Month of Math Software I learned about the death of John Hunter, author of matplotlib, due to complications arising from cancer treatment.  A tribute has been written by Fernando Perez.  My heart goes out to his family and friends.
  • After 8 months work, version 0.11 of SciPy is now available.  Go to http://docs.scipy.org/doc/scipy/reference/release.0.11.0.html for the good stuff which includes improvements to the optimisation routines and new routines for dense and sparse matrices among others.
  • A new major release of pandas is available. Pandas provides easy-to-use data structures and data analysis tools for Python.  See what’s new in 0.9.0 at http://pandas.pydata.org/pandas-docs/dev/whatsnew.html

Bits of this and that

And finally….

I am a big fan of the xkcd webcomic and so a recent question on the Mathematica StackExchange site instantly caught my eye.  Xkcd often publishes hand drawn graphs that look like this:
xkcd graph

The question asked…How could one produce graphs that look like this using Mathematica?  It didn’t take long before the community came up with some code that automatically produces plots like this
xkcd mathematica

I am definitely going to use style in my next presentation!  Not to be out-done, others have since done similar work in R, MATLAB and Latex.

September 26th, 2012 | Categories: math software, matlab, programming, random numbers | Tags:

Pop quiz: What does the following line of MATLAB code do?

rand('state',10)

If you said ‘It changes the seed of the random number generator to 10’ you get half a point.

‘Only half a point!?’ I hear you say accusingly ‘but it says so in my book [for example, 1-3], why not a full point?’

You only get a full point if you’d said something like ‘It changes the seed of the random number generator to 10 and it also changes the random number generator from the high quality, default Mersenne Twister generator to a lower quality legacy random number generator.

OK, how about this one?

rand('seed',10)

This behaves in a very similar manner– it changes both the seed and the type of the underlying generator. However, the random number generator it switches to this time is an even older one that was introduced as far back as MATLAB version 4.  It is not very good at all by modern standards!

A closer look

Open up a fresh copy of a recent version of MATLAB and ask it about the random number generator it’s using

>> RandStream.getGlobalStream
ans =
mt19937ar random stream (current global stream)
             Seed: 0
  NormalTransform: Ziggurat

mt1993ar refers to a particular variant of the Mersenne Twister algorithm— an industry strength random number generator that’s used in many software packages and simulations.  It’s been the default generator in MATLAB since 2007a.  Change the seed using the modern (since 2011a), recommended syntax and ask again:

>> rng(10)
>> RandStream.getGlobalStream
ans =
mt19937ar random stream (current global stream)
             Seed: 10
  NormalTransform: Ziggurat

This is behaving exactly as you’d expect, you ask it to change the seed and it changes the seed…nothing more, nothing less. Now, let’s use the older syntax

>> rand('state',10)
>> RandStream.getGlobalStream
ans =
legacy random stream (current global stream)
  RAND algorithm: V5 (Subtract-with-Borrow), RANDN algorithm: V5 (Ziggurat)

The random number generator has completely changed!   We are no longer using the Mersenne Twister algorithm, we are now using a ‘subtract with borrow’ [see reference 4 for implementation details] generator which has been shown to have several undesirable issues [5-7].

Let’s do it again but this time using the even older ‘seed’ version:

>> rand('seed',10)
>> RandStream.getGlobalStream
ans =
legacy random stream (current global stream)
  RAND algorithm: V4 (Congruential), RANDN algorithm: V5 (Ziggurat)

Now, this random number generator is ancient by computing standards.  It also has a relatively tiny period of only 2 billion or so.  For details see [4]

Why this matters

Now, all of this is well documented so you may wonder why I am making such a big issue out of it.  Here are my reasons

  • I often get sent MATLAB code for the purposes of code-review and optimisation.  I see the old seeding syntax a LOT and the program’s authors are often blissfully unaware of the consequnces.
  • The old syntax looks like all it should do is change the seed.  It doesn’t!  Before 2007a, however, it did!
  • The old syntax is written in dozens of books because it was once the default, correct syntax to use.
  • Many users don’t read the relevent section of the MATLAB documentation because they have no idea that there is a potential issue.  They read a book or tutorial..it says to use rand(‘state’,10) so they do.
  • MATLAB doesn’t use the old generators by default any more because they are not very good [4-7]!
  • Using these old generators may adversely affect the quality of your simulation.

The bottom line

Don’t do either of these to change the seed of the default generator to 10:

rand('state',10)
rand('seed',10)

Do this instead:

rng(10)

Only if you completely understand and accept the consequences of the older syntax should you use it.

References

1. ‘MATLAB – A practical introduction to programming and problem solving’, 2009,Stormy Attaway

2. MATLAB Guide (Second Edition), 2005, Desmond Higham and Nicholas Higham

3. Essential MATLAB for Engineers and Scientists (Fourth Edition), 2009, Hahn and Valentine

4. Numerical Computing with MATLAB, 2004, Cleve Moler (available online)

5.  Why does the random number generator in MATLAB fail a particular test of randomness? The Mathworks, retreived 26th September 2012

6. A strong nonrandom pattern in Matlab default random number generator, 2006, Petr Savicky, retreived 26th September 2012

7.  Learning Random Numbers: A Matlab Anomaly, 2008, Petr Savicky and Marko Robnik-Šikonja, Applied Artificial Intelligence, Vol22 issue 3, pp 254-265

Other posts on random numbers in MATLAB

September 21st, 2012 | Categories: Financial Math, mathematica, programming | Tags:

One of my favourite investment news sites is The Motley Fool which frequently run articles such as 10 Shares Trading Near 52 week lows and 15 Shares Trading Near 52 week Highs.  The idea behind such filtering is to seek out shares that have done particularly badly (or well) over the last year and then subject them to further analysis in order to find opportunities.  Thanks to Mathematica’s FinancialData command, it is rather easy to generate these lists yourself whenever you like.

15 Shares Trading Near 52 Week Highs

The original article selected the 15 largest cap shares from the FTSE All Share Index that were trading within 3% of their 52 week high at the time of publication.  Let’s see how to do that using Mathematica.

The following code returns the tickers of all shares from the FTSE All Share Index that are trading within 3% of their 52 week high.

percentage = 3;
all52weekHighs =
 Select[FinancialData["^FTAS", "Members"],
 Abs[FinancialData[#, "FractionalChangeHigh52Week"]] < (percentage/100.) &];

The variable all52weekHighs contains a list of stock tickers (e.g. LLOY.L) that meet our criteria.  The next thing to do is to find the market cap of each one:

all52WeekHighsWithCaps =
 Map[{#, FinancialData[#, "MarketCap"]} &, all52weekHighs];

This works fine for most shares. LloydsTSB for example returns {“LLOY.L”, 2.7746*10^10} at the time of writing but the MarketCap query fails for some tickers. For example, the Market Cap for HSL.L is not available and we get {“HSL.L”, Missing[“NotAvailable”]}.  Let’s discard these by insisting that we only consider stocks that have a numeric market cap.

Goodall52WeekHighsWithCaps =
  Select[all52WeekHighsWithCaps, NumberQ[#[[2]]] &];

We sort the list according to MarketCap:

sorted = Sort[Goodall52WeekHighsWithCaps, #1[[2]] > #2[[2]] &];

Let’s prettify the list a little by iterating over all tickers and replacing the ticker with the associated stock name. Also, let’s divide the market cap by 1 million to make it more readable

finallist =
 Map[{FinancialData[#[[1]], "Name"], #[[2]]/1000000} &, sorted];

Now, you may be wondering why I haven’t been showing you the output of these commands. This is simply because even this final list is rather large at 118 entries at the time of writing

Length[finallist]

118

The original article only considered the top 15 sorted by Market Cap so let’s show those. Market Caps are given in millions.

top15 = finallist[[1;;15]]//Grid

HSBC Holdings PLC    94159.
National Grid    24432.
Prudential PLC    21775.
Centrica PLC    17193.
Rolls Royce Group    16363.
WPP Plc    10743.
Experian PLC    10197.
Old Mutual PLC    8400.
Legal & General Group PLC    8036.
Wolseley PLC    7955.
Standard Life    6662.
J Sainsbury plc    6401.
Aggreko PLC    6391.
Land Securities Group PLC    6180.
British Land Co PLC    4859.

and we are done.

September 16th, 2012 | Categories: Carnival of Math | Tags:

Welcome to the 90th edition of the Carnival of Mathematics and the first one hosted by me since I handed over the administrative reigns to the good people of aperiodical.  The CoM is a great way to read about and promote mathematical blogging and has been running for over 5 years.  Hosted on a different blog each month, it covers the entire mathematical spectrum from simple mucking around with numbers right up to cutting edge research.

Writers can submit their own posts for inclusion in a carnival if they like and anyone can submit any mathy post that they’ve found interesting– ideally, something written over the last month or so to keep it fresh.

If you want to keep up with the CoM, head over to its twitter feed or the dedicated page at aperiodical.

Trivia

Carnival tradition dictates that I post some trivia about this month’s edition number.  Here’s what I came up with for 90:

Neat Stuff

Computation

Puzzles and Games

  • Shecky R brings us Mind Wrenching – A self-referential logic puzzle that will give your brain cells a workout.
  • Brent Yorgey has been visualizing winning strategies for “nim-like” games and says ‘This is a post about visualizing winning strategies for certain games where players take turns removing counters from two piles.  The games make for fun games to actually play, and analyzing them can get quite interesting!’

Funnies

Art and Mathematics

Books

Tricks and Tactics

Topology

  • Mark Dominus of The Universe of Discourse gives us three posts this month: A two parter on topology and set theory (Click here for part 1 and here for part 2).

Teaching

  • Dan McQuillan gives us On Trigonometric Nostalgia and says ‘This is a post about fostering a problem-solving mentality in a world where we do not even understand how our own tools work. It superimposes our nostalgia for the world we used to know with our innate curiosity, which still exists. Basic trigonometry is still fun and still relevant. Indeed, one can always ask questions and calculate!’
  • Frederick Koh takes on the dot product in Understanding MATTERS (7) saying ‘This dot product concept involving parallel vector planes is rather fundamental, yet a handful of my students are unable to figure out how things exactly work. Hence I have decided to pen this detailed explanation in the hope that it will benefit not just my charges, but other math learners as well.’
  • Augustus Van Dusen has written the first in an upcoming series of posts that will prove properties of logarthmic and exponential functions. Augustus says ‘This particular post will focus on the properties of logarithmic functions of real variables. Students in advanced placement calculus in high school and beginning college students who are not math majors are the intended audience.’

Wolfram Rule 90

Comment

Not the only game in town

The Carnival of Mathematics isn’t the only mathematical blog carnival that’s doing the tour.  There’s also the fantastic monthly Math Teachers at Play.

End

That’s it for the 90th Edition.  Past editions written by me include 80, 76, 74 and 73 among others.  For future editions keep an eye on @carnivalofmath and aperiodical.com

September 4th, 2012 | Categories: Carnival of Math | Tags:

I’m hosting the 90th Carnival of Mathematics very soon.  If you have written (or read) a mathematics blog article over the last month and want to give it more attention, why not make a submission? The deadline for submissions is 10th September.

September 2nd, 2012 | Categories: Month of Math Software | Tags:

Welcome to the August edition of A Month of Math Software– a regular series of articles where I share what’s shiny and new in the world of mathematical software.  If you like what you see and want more, take a browse through the archives.  If you have some news that you think should go into next month’s edition, contact me to tell me all about it so I can tell the world.

This edition includes lots of new releases, blog posts and news about mathematics on mobile devices…enjoy!

Mobile Mathematics

August was a very big month for mobile mathematical applications with the following releases

General purpose mathematics

Do numerical computing using….Javascript!

  • The Numeric Javascript library has been updated to version 1.2.2. The main new feature is linear programming– the function is numeric.solveLP()

Mathematical software libraries

GPU Programming

GPU stands for Graphical Processing Unit but these days you can get a GPU to do a lot more than just graphics.  You could think of them as essentially massively parallel math  co-processors that can make light work of certain operations.

  • Jacket is a commercial GPU Processing add-on for MATLAB.  In recent blog posts, the Jacket developers discuss SAR Image Formation Algorithms on the GPU and Option Pricing.
  • CULA is a set of commercial GPU-accelerated linear algebra libraries.  CULA-Dense is, as you might expect, for dense matrices and is now at version 15.    CULA-Sparse is at version S3.  I can’t find a what’s new document but the main change seems to be the addition of support for NVIDIA’s Kepler architecture.  The CULA library can be called from C, C++, Fortran, MATLAB, and Python and is free for individual academic use.
  • GPULib is a commercial software library enabling GPU-accelerated calculations for IDL.  In a recent blog post, one of GPULib’s developers has been experimenting with OpenCL support.

Statistics

Academic codes and applications

  • Version 3.0 of the SCIP Optimization Suite has been released. According to the website, ‘SCIP is currently one of the fastest non-commercial mixed integer programming (MIP) solvers. It is also a framework for constraint integer programming and branch-cut-and-price’. Here are the all important Release Notes and Changelog.
  • Templates for First-Order Conic Solvers (TFOCS, pronounced tee-fox) is a software package that provides a set of templates, or building blocks, that can be used to construct efficient, customized solvers for a variety of models.  The latest version, 1.1a, was released back in February but I have only recently learned of it and so am including it in this month’s edition.  A set of demos and wiki for this software is available.
  • Version 1.0 of Blaze has been released.  Blaze is an open-source, high-performance C++ math library for dense and sparse arithmetic.  There is a getting started tutorial and a set of benchmarks.
August 29th, 2012 | Categories: general math, iPad, Latex, math software | Tags:

While on the train to work I came across a very interesting blog entry.  Full LaTeX support (on device compilation and .dvi viewer) is now available on iPad courtesy of TeX Writer By FastIntelligence.  Here is the blog post telling us the good news http://litchie.com/blog/?p=406

At the time of writing, the blog is down (Update: working again), possibly because of the click storm that my twitter announcement caused..especially after it was picked up by @TeXtip.  So, here is the iTunes link http://itunes.apple.com/us/app/tex-writer/id552717222?mt=8

I haven’t tried this yet but it looks VERY interesting.  If you get a chance to try it out, feel free to let me know how you get on in the comments section.

Update 1: This version of TeX writer(1.1) cannot output to .pdf.  Only .dvi output is supported at the moment.

August 28th, 2012 | Categories: mathematica, natural language, programming, trivia | Tags:

Let’s use Mathematica to to discover the longest English words where the letters are in alphabetical order.  The following command will give all such words

DictionaryLookup[x__ /; Characters[x] == Sort[Characters[x]]]

I’m not going to show all of the output because there are 562 of them (including single letter words such as ‘I’ and ‘a’) as we can see by doing

 Length[
 DictionaryLookup[x__ /; Characters[x] == Sort[Characters[x]]]
]

562

The longest of these words has seven characters:

Max[Map[StringLength,
  DictionaryLookup[x__ /; Characters[x] == Sort[Characters[x]]]]]

7

It turns out that only one such word has the maximum 7 characters

DictionaryLookup[x__ /; Characters[x] == Sort[Characters[x]] && StringLength[x] == 7]

{"billowy"}

There are 34 such words that contain 6 characters

 DictionaryLookup[
 x__ /; Characters[x] == Sort[Characters[x]] && StringLength[x] == 6]

{"abbess", "Abbott", "abhors", "accent", "accept", "access", \
"accost", "adders", "almost", "begins", "bellow", "Bellow", "bijoux", \
"billow", "biopsy", "bloops", "cellos", "chills", "chilly", "chimps", \
"chinos", "chintz", "chippy", "chivvy", "choosy", "choppy", "Deimos", \
"effort", "floors", "floppy", "flossy", "gloppy", "glossy", "knotty"}

If you insist on all letters being different, there are 9:

DictionaryLookup[
 x__ /; Characters[x] == Sort[Characters[x]] && StringLength[x] == 6 &&
    Length[Union[Characters[x]]] == Length[Characters[x]]]

{"abhors", "almost", "begins", "bijoux", "biopsy", "chimps", \
"chinos", "chintz", "Deimos"}

How about where all the letters are in reverse alphabetical order with no repeats? The longest such words have 7 characters

Max[
 Map[StringLength,
  DictionaryLookup[
   x__ /; Characters[x] == Reverse[Sort[Characters[x]]]]]]

7

Here they are

DictionaryLookup[
 x__ /; Characters[x] == Reverse[Sort[Characters[x]]] &&
   StringLength[x] == 7 &&
   Length[Union[Characters[x]]] == Length[Characters[x]]]

{"sponged", "wronged"}
August 24th, 2012 | Categories: CUDA, GPU, OpenCL, parallel programming, programming | Tags:

Updated 26th March 2015

I’ve been playing with AVX vectorisation on modern CPUs off and on for a while now and thought that I’d write up a little of what I’ve discovered.  The basic idea of vectorisation is that each processor core in a modern CPU can operate on multiple values (i.e. a vector) simultaneously per instruction cycle.

Modern processors have 256bit wide vector units which means that each CORE can perform up to 16 double precision  or 32 single precision floating point operations (FLOPS) per clock cycle. So, on a quad core CPU that’s typically found in a decent laptop you have 4 vector units (one per core) and could perform up to 64 double precision FLOPS per cycle. The Intel Xeon Phi accelerator unit has even wider vector units — 512bit!

This all sounds great but how does a programmer actually make use of this neat hardware trick?  There are many routes:-

Intrinsics

At the ‘close to the metal’ level you code for these vector units using instructions called AVX intrinsics.  This is relatively difficult and leads to none-portable code if you are not careful.

Auto-vectorisation in compilers

Since working with intrinsics is such hard work, why not let the compiler take the strain? Many modern compilers can automatically vectorize your C, C++ or Fortran code including gcc, PGI and Intel. Sometimes all you need to do is add an extra switch at compile time and reap the speed benefits. In truth, vectorization isn’t always automatic and the programmer needs to give the compiler some assistance but it is a lot easier than hand-coding intrinsics.

Intel SPMD Program Compiler (ispc)

There is a midway point between automagic vectorisation and having to use intrinsics. Intel have a free compiler called ispc (http://ispc.github.com/) that allows you to write compute kernels in a modified subset of C. These kernels are then compiled to make use of vectorised instruction sets. Programming using ispc feels a little like using OpenCL or CUDA. I figured out how to hook it up to MATLAB a few months ago and developed a version of the Square Root function that is almost twice as fast as MATLAB’s own version for sandy bridge i7 processors.

OpenMP

OpenMP is an API specification for parallel programming that’s been supported by several compilers for many years. OpenMP 4 was released in mid 2013 and included support for vectorisation.

Vectorised Libraries

Vendors of numerical libraries are steadily applying vectorisation techniques in order to maximise performance.  If the execution speed of your application depends upon these library functions, you may get a significant speed boost simply by updating to the latest version of the library and recompiling with the relevant compiler flags.

CUDA for x86

Another route to vectorised code is to make use of the PGI Compiler’s support for x86 CUDA.  What you do is take CUDA kernels written for NVIDIA GPUs and use the PGI Compiler to compile these kernels for x86 processors.  The resulting executables take advantage of vectorisation.  In essence, the vector units of the CPU are acting like CUDA cores–which they sort of are anyway!

The PGI compilers also have technology which they call PGI Unified binary which allows you to use NVIDIA GPUs when present or default to using multi-core x86 if no GPU is present.

  • PGI CUDA-x86 – PGI’s main page for their CUDA on x86 technologies

OpenCL for x86 processors

Yet another route to vectorisation would be to use Intel’s OpenCL implementation which takes OpenCL kernels and compiles them down to take advantage of vector units (http://software.intel.com/en-us/blogs/2011/09/26/autovectorization-in-intel-opencl-sdk-15/).  The AMD OpenCL implementation may also do this but I haven’t tried it and haven’t had chance to research it yet.

WalkingRandomly posts

I’ve written a couple of blog posts that made use of this technology.

Miscellaneous resources

There is other stuff out there but the above covers everything that I have used so far.  I’ll finish by saying that everyone interested in vectorisation should check out this website…It’s the bible!

Research Articles on SSE/AVX vectorisation

I found the following research articles useful/interesting.  I’ll add to this list over time as I dig out other articles.