No Fortran? No data science in R and Python!

November 13th, 2020 | Categories: Fortran | Tags:

Earlier this week Apple announced their new, Arm-based ‘Apple Silicon’ machines to the world in a slick marketing event that had many of us reaching for our credit cards. Simultaneously, The Numerical Algorithms Group announced that they had ported their Fortran Compiler to the new platform. At the time of writing this is the only Fortran compiler publicly available for Apple Silicon although that will likely change soon as open source Fortran compilers get updated.

Fortran? OK Boomer!

At over 60 years old, Fortran is one of the oldest programming languages that continues to be actively developed and used (The latest language specification is Fortran 2018). Routinely mocked by software engineers as old-skool (including me who, over a decade ago, suggested that it shouldn’t be taught to undergraduates), Fortran is the language that everyone overlooks as they spend their days sipping flat-whites in Starbucks while hacking away in MATLAB, R or Python on shiny Macbook Airs.

It can come as quite a shock, therefore, to discover that much of our favourite data science tools simply cannot work natively on any system that doesn’t have a Fortran compiler!

It’s Fortran all the way down

Much of the numerical functionality we routinely use today was developed decades ago and released in Fortran. More modern systems, such as R, make direct use of a lot of this code because it is highly performant and, perhaps more importantly, has been battle tested in production for decades.  Numerical computing is hard (even when all of your instincts suggest otherwise) and when someone demonstrably does it right, it makes good sense to reuse rather than reinvent.

As a result, with no Fortran, there’s no R.

R_fortran

The Python crowd don’t get away with it either.  Here are the GitHub stats for Scipy as of today

scipi_fortran

Of course, no Scipy means you also can’t have anything that depends on Scipy including things like Keras or Scikit-learn.  Also, if you want good performance for linear algebra operations (Which underpins a huge number of data science and ML algorithms) then you’ll need a good BLAS implementation.  Many projects use OpenBLAS (It’s a popular option in Numpy for example) which…you guessed it…is almost 49% Fortran.

Emulation for now

NAG’s Fortran compiler is excellent (it routinely tops independent charts for its checking and Fortran standards compliance for example) but it is commercial.   The community needs and will demand open source (or at least free) Fortran compilers if data scientists are ever going to realise the full potential of Apple’s new hardware and I have no doubt that these are on the way.  Other major silicon providers (e.g. Intel, AMD, NEC and NVIDIA/PGI) have their own Fortran compiler that co-exist with the open ones.  Perhaps Apple should join the club (I suggest they talk to NAG!).

Until this is resolved, most of us will be relying on the Rosetta2 emulation…which, thankfully, is apparently pretty good!

  1. NAGFan
    November 13th, 2020 at 20:30
    Reply | Quote | #1

    Nice blog Mike,
    Numerical Computing is indeed hard. One of your colleagues (Mick Pont – legend) now retired always spoke very well about this… I am sure you have seen his blog, but for the benefit of others check out https://www.nag.com/blog/bitwise-reproducibility-nag-libraries

  2. Doug Hill
    November 15th, 2020 at 19:10
    Reply | Quote | #2
  3. Mike Croucher
    November 15th, 2020 at 20:26
    Reply | Quote | #3

    Indeed but it’s not yet finished according to this November 2nd 2020 post from the R project https://developer.r-project.org/Blog/public/2020/11/02/will-r-work-on-apple-silicon/index.html