_ and R: How R relates to other technologies

October 13th, 2016 | Categories: math software, R, RSE, Scientific Software, University of Sheffield | Tags:

I was recently invited to give a talk at the Sheffield R Users Group and decided to give a brief overview of how R relates to other technologies. Subjects included Mathematica’s integration of R, Intel’s compilers, Math Kernel Library and how they can make R faster and a range of Microsoft technologies including R Tools for Visual Studio, Microsoft R Open and the MRAN for reproducibility. I also touched upon the NAG Library, Maple’s code generation for R, GPUs and Spark.

Did I miss anything? If you were to give a similar talk, what might you have included?

  1. oversky
    October 13th, 2016 at 11:48
    Reply | Quote | #1

    How does “Accelerated versions of R for Iceberg” compare with Microsoft R Open?
    Which is faster?
    Is there any reason not to use them, like compatible issues?

  2. October 13th, 2016 at 15:48
    Reply | Quote | #2

    I would add Julia’s interop with R. A lot of people are heavy R users, but are interested in Julia for the faster development than having to use C++. RCall is a very mature library to call R from Julia: either just use the string macro R”” or enter an R terminal by using $ at the empty REPL. You also have RJulia to for the other direction: letting Julia be a drop in replacement for C++ and giving access to its libraries.

  3. Mike Croucher
    October 14th, 2016 at 12:54
    Reply | Quote | #3

    @oversky
    For linear algebra, they are likely the same speed. Both are using the Intel MKL to do the heavy lifting.

    The build I did on Iceberg used both the Intel MKL and the Intel Compiler. I’m not sure what compiler Microsoft R Open used. It’s possible one is faster than the other in certain circumstances (At a guess, anything that uses Rcpp and is numerically intensive is a candidate) but I have no data.

    There should be no compatibility issues (All the tests pass in my Iceberg build for example) with the following caveats:-

    – Different implementations of LAPACK might return different (but still mathematically correct!) results for some operations where the answer is not guaranteed to be unique.
    – Most Rcpp routines are probably developed with the gcc compiler in mind. It may be that there are differences between gcc and Intel that make some things not compile. I haven’t found any yet though.

  4. Mike Croucher
    October 14th, 2016 at 12:54
    Reply | Quote | #4

    @Chris Rackauckas

    I like your blog’s title :)

  5. Michael
    October 15th, 2016 at 01:41
    Reply | Quote | #5

    Recently we are working with Sisense for Business Intelligence, and it has R-Integration available for heavy operations.

  6. Dave Love
    November 30th, 2016 at 14:24
    Reply | Quote | #6

    I think the bit about Intel tools is drawing the wrong conclusions.

    I assume iceberg is running something RHEL-like. In that case, the only reason for needing anything other than the system R packages for fast linear algebra is that Fedora/EPEL doesn’t have a sensible linear algebra policy like Debian’s, and some things are unfortunately packaged. Anyhow, now the R packages in the epel-testing repo use an essentially optimal level 3 BLAS for x86 SIMD below AVX512 (which is infinitely faster than MKL on other architectures, of course).

    Previously you could rely on a hack to fix use of slow BLAS system-wide, and this business still needs to be tackled in Fedora with policy. See https://loveshack.fedorapeople.org/blas-subversion.html for discussion, figures, and a circular reference.

    For use of R on an HPC system, what about pdbR?

  7. Mike Croucher
    December 1st, 2016 at 12:29
    Reply | Quote | #7

    Thanks Dave, great information. I’ll take a look at building with OpenBLAS next time.

  8. Mike Croucher
    December 1st, 2016 at 12:57
    Reply | Quote | #8

    This March 2016 article from Intel shows MKL beating OpenBLAS across the board – https://software.intel.com/en-us/articles/performance-comparison-of-openblas-and-intel-math-kernel-library-in-r

    It used OpenBLAS 0.2.14 which is older than the one referenced in your post. No links to benchmark code.

    We have a new Haswell System under development. It will be interesting (to me at least) to run these benchmarks again, including OpenBLAS, on that system.

  9. Dave Love
    December 2nd, 2016 at 18:37
    Reply | Quote | #9

    @Mike Croucher

    But you could just do a yum/dnf/apt-get install.

  10. Dave Love
    December 2nd, 2016 at 18:42

    @Mike Croucher
    Proper reporting of experimental results — we’ve heard of it. As an
    unapologetic experimentalist, I rarely see “benchmark” results that
    report enough for useful insight, even when they’re not trying to sell
    you something (though yours were straightforward).

    There are two important bits of context for the linear algebra results
    that article doesn’t report, which make the numbers worthless and
    might make one think uncharitably that it’s intentionally misleading.
    I can’t properly square the LA numbers with them and the hint about
    one, though.

    Saar’s DGEMM Haswell kernels in OpenBLAS list their rough parity with
    MKL — clearly on faster hardware than I have — and the obvious rough
    estimate for Haswell DGEMM v. Sandybridge is right.

    For the small MM mentioned, the SC16 material referred to under
    https://github.com/hfp/libxsmm compares libxsmm with MKL. libxsmm is
    currently in epel-testing.

  11. Mike Croucher
    December 3rd, 2016 at 20:46

    @Dave – We don’t usually use the yum / apt repos to install things on our HPC cluster. I believe the same is true at Manchester and probably elsewhere. The principal reason being that we have to support multiple versions simultaneously using environment modules. This has led to the requirement to install things from source.

    Some of us are looking at projects that give us what we need without needing to reinvent the wheel every time. Projects such as Spack https://github.com/LLNL/spack and EasyBuild https://easybuild.readthedocs.io/en/latest/ but we’ve not taken the plunge yet.

  12. Anonymous
    December 6th, 2016 at 18:53

    I don’t understand the requirement for N versions of R, especially times N2 compilers and N3 MPIs with the problems I’ve seen. Anyhow, Spack is the only reasonable way to manage such a setup I’ve seen as it provides some package management (“pack” v. “build”). That’s off the topic, though I can rant for England on the non-technical issue of reinvention.