## Full LaTeX on iPad with on device compilation and .dvi viewer

August 29th, 2012

While on the train to work I came across a very interesting blog entry.  Full LaTeX support (on device compilation and .dvi viewer) is now available on iPad courtesy of TeX Writer By FastIntelligence.  Here is the blog post telling us the good news http://litchie.com/blog/?p=406

At the time of writing, the blog is down (Update: working again), possibly because of the click storm that my twitter announcement caused..especially after it was picked up by @TeXtip.  So, here is the iTunes link http://itunes.apple.com/us/app/tex-writer/id552717222?mt=8

I haven’t tried this yet but it looks VERY interesting.  If you get a chance to try it out, feel free to let me know how you get on in the comments section.

Update 1: This version of TeX writer(1.1) cannot output to .pdf.  Only .dvi output is supported at the moment.

## Computational word trivia with Mathematica – longest words with letters in alphabetical order

August 28th, 2012

Let’s use Mathematica to to discover the longest English words where the letters are in alphabetical order.  The following command will give all such words

DictionaryLookup[x__ /; Characters[x] == Sort[Characters[x]]]

I’m not going to show all of the output because there are 562 of them (including single letter words such as ‘I’ and ‘a’) as we can see by doing

 Length[
DictionaryLookup[x__ /; Characters[x] == Sort[Characters[x]]]
]

562

The longest of these words has seven characters:

Max[Map[StringLength,
DictionaryLookup[x__ /; Characters[x] == Sort[Characters[x]]]]]

7

It turns out that only one such word has the maximum 7 characters

DictionaryLookup[x__ /; Characters[x] == Sort[Characters[x]] && StringLength[x] == 7]

{"billowy"}

There are 34 such words that contain 6 characters

 DictionaryLookup[
x__ /; Characters[x] == Sort[Characters[x]] && StringLength[x] == 6]

{"abbess", "Abbott", "abhors", "accent", "accept", "access", \
"accost", "adders", "almost", "begins", "bellow", "Bellow", "bijoux", \
"billow", "biopsy", "bloops", "cellos", "chills", "chilly", "chimps", \
"chinos", "chintz", "chippy", "chivvy", "choosy", "choppy", "Deimos", \
"effort", "floors", "floppy", "flossy", "gloppy", "glossy", "knotty"}

If you insist on all letters being different, there are 9:

DictionaryLookup[
x__ /; Characters[x] == Sort[Characters[x]] && StringLength[x] == 6 &&
Length[Union[Characters[x]]] == Length[Characters[x]]]

{"abhors", "almost", "begins", "bijoux", "biopsy", "chimps", \
"chinos", "chintz", "Deimos"}

How about where all the letters are in reverse alphabetical order with no repeats? The longest such words have 7 characters

Max[
Map[StringLength,
DictionaryLookup[
x__ /; Characters[x] == Reverse[Sort[Characters[x]]]]]]

7

Here they are

DictionaryLookup[
x__ /; Characters[x] == Reverse[Sort[Characters[x]]] &&
StringLength[x] == 7 &&
Length[Union[Characters[x]]] == Length[Characters[x]]]

{"sponged", "wronged"}

## Vectorising code to take advantage of modern CPUs (AVX and SSE)

August 24th, 2012

Updated 26th March 2015

I’ve been playing with AVX vectorisation on modern CPUs off and on for a while now and thought that I’d write up a little of what I’ve discovered.  The basic idea of vectorisation is that each processor core in a modern CPU can operate on multiple values (i.e. a vector) simultaneously per instruction cycle.

Modern processors have 256bit wide vector units which means that each CORE can perform up to 16 double precision  or 32 single precision floating point operations (FLOPS) per clock cycle. So, on a quad core CPU that’s typically found in a decent laptop you have 4 vector units (one per core) and could perform up to 64 double precision FLOPS per cycle. The Intel Xeon Phi accelerator unit has even wider vector units — 512bit!

This all sounds great but how does a programmer actually make use of this neat hardware trick?  There are many routes:-

Intrinsics

At the ‘close to the metal’ level you code for these vector units using instructions called AVX intrinsics.  This is relatively difficult and leads to none-portable code if you are not careful.

Auto-vectorisation in compilers

Since working with intrinsics is such hard work, why not let the compiler take the strain? Many modern compilers can automatically vectorize your C, C++ or Fortran code including gcc, PGI and Intel. Sometimes all you need to do is add an extra switch at compile time and reap the speed benefits. In truth, vectorization isn’t always automatic and the programmer needs to give the compiler some assistance but it is a lot easier than hand-coding intrinsics.

Intel SPMD Program Compiler (ispc)

There is a midway point between automagic vectorisation and having to use intrinsics. Intel have a free compiler called ispc (http://ispc.github.com/) that allows you to write compute kernels in a modified subset of C. These kernels are then compiled to make use of vectorised instruction sets. Programming using ispc feels a little like using OpenCL or CUDA. I figured out how to hook it up to MATLAB a few months ago and developed a version of the Square Root function that is almost twice as fast as MATLAB’s own version for sandy bridge i7 processors.

OpenMP

OpenMP is an API specification for parallel programming that’s been supported by several compilers for many years. OpenMP 4 was released in mid 2013 and included support for vectorisation.

Vectorised Libraries

Vendors of numerical libraries are steadily applying vectorisation techniques in order to maximise performance.  If the execution speed of your application depends upon these library functions, you may get a significant speed boost simply by updating to the latest version of the library and recompiling with the relevant compiler flags.

CUDA for x86

Another route to vectorised code is to make use of the PGI Compiler’s support for x86 CUDA.  What you do is take CUDA kernels written for NVIDIA GPUs and use the PGI Compiler to compile these kernels for x86 processors.  The resulting executables take advantage of vectorisation.  In essence, the vector units of the CPU are acting like CUDA cores–which they sort of are anyway!

The PGI compilers also have technology which they call PGI Unified binary which allows you to use NVIDIA GPUs when present or default to using multi-core x86 if no GPU is present.

• PGI CUDA-x86 – PGI’s main page for their CUDA on x86 technologies

OpenCL for x86 processors

Yet another route to vectorisation would be to use Intel’s OpenCL implementation which takes OpenCL kernels and compiles them down to take advantage of vector units (http://software.intel.com/en-us/blogs/2011/09/26/autovectorization-in-intel-opencl-sdk-15/).  The AMD OpenCL implementation may also do this but I haven’t tried it and haven’t had chance to research it yet.

WalkingRandomly posts

I’ve written a couple of blog posts that made use of this technology.

Miscellaneous resources

There is other stuff out there but the above covers everything that I have used so far.  I’ll finish by saying that everyone interested in vectorisation should check out this website…It’s the bible!

Research Articles on SSE/AVX vectorisation

I found the following research articles useful/interesting.  I’ll add to this list over time as I dig out other articles.

August 22nd, 2012

Someone recently asked me if WalkingRandomly.com had a facebook page.  Since it wasn’t much effort to create one, I have now done so.  I have no idea if this will be of any use to anyone but a first stab at it is at http://www.facebook.com/Walkingrandomly

Now I have to decide what to do with it. Does anyone have any thoughts on a blog such as this having its own facebook profile?  Is it a good idea?  Will anyone make use of it or is it just pointless?

August 21st, 2012

I work for a very large UK University and am part of a team who is responsible for the network licenses of dozens of software applications serving thousands of clients.  Many of these licenses need to be replaced annually.  From a technical point of view this is usually fairly trivial but there are various management and administrative considerations that need to be made to ensure that user-side disruption is kept as low as possible.

Each software vendor has its own process for managing the license renewal process..some of them do it very well and some of them seem to work to make our life as difficult as possible but there is one problem that is shared by many software applications.

In a misguided attempt to be helpful to the user, many network licensed applications will display a launch-time message if the network license is set to expire ‘soon’ (where ‘soon’ might be as much as 90 days in the future’) .  The user sees something like

Can you guess what happens when several hundred users see that message?  Yep, we get snowed under in panic queries for a ‘problem’ that does not exist.  The license upgrade is scheduled and will happen behind the scenes without the user ever needing to know anything about it.

So..to all software vendors whose applications do this for network licenses….PLEASE SHUT UP!

## WalkingRandomly is 5 years old today

August 20th, 2012

My first post on WalkingRandomly was on 20th August 2007 and, to be honest, I had no idea what I was going to write about or if I would keep at it; I certainly didn’t think that I would still be going strong 5 years later.  Although I love to write, there is one reason why I keep on pounding out the posts here…you!

To all commenters, twitter followers, emailers and anyone else who has contacted me over the last 5 years regarding this blog….THANK YOU!  Without you, I would have given up years ago because no one wants to write reams of stuff that doesn’t get read.

There are various things one could write about for a post like this.  Maybe I could post reader statistics for example or discuss what I’ve got out of blogging over the last few years.  Alternatively, I could go down the route of posting links to some of the posts I’m most proud of (or otherwise!).  I considered all these things but realised that what was most important to me was just to say thank you to all of you.

Thanks!

## The Julia Language hits a new speed milestone

August 14th, 2012

I first mentioned Julia, a new language for high performance scientific computing, back in the February edition of a Month of Math software and it certainly hasn’t stood still since then.  A WalkingRandomly reader, Ohad, recently contacted me to tell me about a forum post announcing some Julia speed improvements.

Julia has a set of micro-benchmarks and the slowest of them is now only two times slower than the equivalent in C.  That’s compiled language performance with an easy to use scripting language.  Astonishingly, Julia is faster than gfortran in a couple of instances.  Nice work!

Comparison times between Julia and other scientific scripting languages (MATLAB, Python and R for instance) for these micro-benchmarks are posted on Julia’s website.  The Julia team have included the full benchmark source code used for all languages so if you are an expert in one of them, why not take a look at how they have represented your language and see what you think?

Let me know if you’ve used Julia at all, I’m interested in what people think of it.

## Variable precision QR decomposition in MATLAB

August 13th, 2012

Someone recently contacted me complaining that MATLAB could not do a QR factorisation of a variable precision arithmetic matrix.  Double precision matrices work fine of course:

>> A=[2 1 3;-1 0 7; 0 -1 -1];
>> [Q R]=qr(A)

Q =
-0.8944   -0.1826    0.4082
0.4472   -0.3651    0.8165
0    0.9129    0.4082

R =
-2.2361   -0.8944    0.4472
0   -1.0954   -4.0166
0         0    6.5320

Variable precision matrices do not (I’m using MATLAB 2012a and the symbolic toolbox here).

>> a=vpa([2 1 3;-1 0 7; 0 -1 -1]);
>> [Q R]=qr(a)
Undefined function 'qr' for input arguments of type 'sym'.

It turns out that MATLAB and the symbolic toolbox CAN do variable precision QR factorisation….it’s just hidden a bit. The following very simple function, vpa_qr.m, shows how to get at it

function [ q,r ] = vpa_qr( x )
result = feval(symengine,'linalg::factorQR',x);
q=result(1);
r=result(2);
end

Let’s see how that does

>> a=vpa([2 1 3;-1 0 7; 0 -1 -1]);
>> [Q R]=vpa_qr(a);

I’ve suppressed the output because it’s so large but it has definitely worked. Let’s take a look at the first element of Q for example

>> Q(1)

ans =
0.89442719099991587856366946749251

Which is correct to the default number of variable precision digits, 32.  Of course we could change this to anything we like using the digits function.

## Making Ubuntu 12.04 useable

August 10th, 2012

I recently installed Ubuntu 12.04 on my laptop.  I gave Unity a chance for a few days just in case it had improved since I last used it but still found it unusable.  The following tweaks made Ubuntu usable again (for me at least).

That was pretty much it and I’m very happy with the result.  Do you use Ubuntu?  If so, are there any tweaks that you simply must make to the default setup before you feel that it’s usable for you?

## What are the best commerical non-mathworks MATLAB toolboxes?

August 8th, 2012

The Mathworks sell dozens of toolboxes for MATLAB but they are not the only ones doing it.  Many 3rd party vendors also sell MATLAB toolboxes and many of them are of extremely high quality.  Here are some of my favourites

• The NAG Toolbox for MATLAB – Over 1500 high quality numerical routines from the Numerical Algorithms Group.  Contains local and global optimisation, statistics, finance, wavelets, curve fitting, special functions and much much more. Superb!
• AccelerEyes Jacket – Very fast and comprehensive GPU computing toolbox for MATLAB.  I’ve used it a lot.
• Multi-precision toolbox for MATLAB – A colleague at Manchester recently told me about this one as his group uses it a lot in their research.  I’ve yet to play with it but it looks great.

Which commercial, 3rd party toolboxes do you use/rate and why?

If you like this list, you may also like my list of high quality, free MATLAB toolboxes