_ and R: How R relates to other technologies
October 13th, 2016
| Categories: math software, R, RSE, Scientific Software, University of Sheffield
| Tags:
I was recently invited to give a talk at the Sheffield R Users Group and decided to give a brief overview of how R relates to other technologies. Subjects included Mathematica’s integration of R, Intel’s compilers, Math Kernel Library and how they can make R faster and a range of Microsoft technologies including R Tools for Visual Studio, Microsoft R Open and the MRAN for reproducibility. I also touched upon the NAG Library, Maple’s code generation for R, GPUs and Spark.
- Slide deck: _ and R: How R relates to other technologies
Did I miss anything? If you were to give a similar talk, what might you have included?
How does “Accelerated versions of R for Iceberg” compare with Microsoft R Open?
Which is faster?
Is there any reason not to use them, like compatible issues?
I would add Julia’s interop with R. A lot of people are heavy R users, but are interested in Julia for the faster development than having to use C++. RCall is a very mature library to call R from Julia: either just use the string macro R”” or enter an R terminal by using $ at the empty REPL. You also have RJulia to for the other direction: letting Julia be a drop in replacement for C++ and giving access to its libraries.
@oversky
For linear algebra, they are likely the same speed. Both are using the Intel MKL to do the heavy lifting.
The build I did on Iceberg used both the Intel MKL and the Intel Compiler. I’m not sure what compiler Microsoft R Open used. It’s possible one is faster than the other in certain circumstances (At a guess, anything that uses Rcpp and is numerically intensive is a candidate) but I have no data.
There should be no compatibility issues (All the tests pass in my Iceberg build for example) with the following caveats:-
– Different implementations of LAPACK might return different (but still mathematically correct!) results for some operations where the answer is not guaranteed to be unique.
– Most Rcpp routines are probably developed with the gcc compiler in mind. It may be that there are differences between gcc and Intel that make some things not compile. I haven’t found any yet though.
@Chris Rackauckas
I like your blog’s title :)
Recently we are working with Sisense for Business Intelligence, and it has R-Integration available for heavy operations.
I think the bit about Intel tools is drawing the wrong conclusions.
I assume iceberg is running something RHEL-like. In that case, the only reason for needing anything other than the system R packages for fast linear algebra is that Fedora/EPEL doesn’t have a sensible linear algebra policy like Debian’s, and some things are unfortunately packaged. Anyhow, now the R packages in the epel-testing repo use an essentially optimal level 3 BLAS for x86 SIMD below AVX512 (which is infinitely faster than MKL on other architectures, of course).
Previously you could rely on a hack to fix use of slow BLAS system-wide, and this business still needs to be tackled in Fedora with policy. See https://loveshack.fedorapeople.org/blas-subversion.html for discussion, figures, and a circular reference.
For use of R on an HPC system, what about pdbR?
Thanks Dave, great information. I’ll take a look at building with OpenBLAS next time.
This March 2016 article from Intel shows MKL beating OpenBLAS across the board – https://software.intel.com/en-us/articles/performance-comparison-of-openblas-and-intel-math-kernel-library-in-r
It used OpenBLAS 0.2.14 which is older than the one referenced in your post. No links to benchmark code.
We have a new Haswell System under development. It will be interesting (to me at least) to run these benchmarks again, including OpenBLAS, on that system.
@Mike Croucher
But you could just do a yum/dnf/apt-get install.
@Mike Croucher
Proper reporting of experimental results — we’ve heard of it. As an
unapologetic experimentalist, I rarely see “benchmark” results that
report enough for useful insight, even when they’re not trying to sell
you something (though yours were straightforward).
There are two important bits of context for the linear algebra results
that article doesn’t report, which make the numbers worthless and
might make one think uncharitably that it’s intentionally misleading.
I can’t properly square the LA numbers with them and the hint about
one, though.
Saar’s DGEMM Haswell kernels in OpenBLAS list their rough parity with
MKL — clearly on faster hardware than I have — and the obvious rough
estimate for Haswell DGEMM v. Sandybridge is right.
For the small MM mentioned, the SC16 material referred to under
https://github.com/hfp/libxsmm compares libxsmm with MKL. libxsmm is
currently in epel-testing.
@Dave – We don’t usually use the yum / apt repos to install things on our HPC cluster. I believe the same is true at Manchester and probably elsewhere. The principal reason being that we have to support multiple versions simultaneously using environment modules. This has led to the requirement to install things from source.
Some of us are looking at projects that give us what we need without needing to reinvent the wheel every time. Projects such as Spack https://github.com/LLNL/spack and EasyBuild https://easybuild.readthedocs.io/en/latest/ but we’ve not taken the plunge yet.
I don’t understand the requirement for N versions of R, especially times N2 compilers and N3 MPIs with the problems I’ve seen. Anyhow, Spack is the only reasonable way to manage such a setup I’ve seen as it provides some package management (“pack” v. “build”). That’s off the topic, though I can rant for England on the non-technical issue of reinvention.