Differing behaviour of numpy’s log1p function across platforms
The test suite of a project I’m working on is poking around at the extreme edges of the range of double precision numbers. I noticed a difference between Windows and other platforms that I can’t yet fully explain. On Windows, the test suite was pumping out RuntimeWarnings that we don’t see in Linux or Mac. I’ve distilled the issue down to a single numpy command:
np.log1p(1.7976931348622732e+308)
On Windows 7 Anaconda Python 2.3, this gives a RuntimeWarning and returns inf whereas on Linux and Mac OS X it evaluates to 709.78-ish
Numpy version is 1.9.2 in all cases.
64 bit Windows 7
Python 2.7.10 |Continuum Analytics, Inc.| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) __main__:1: RuntimeWarning: overflow encountered in log1p inf
64 bit Linux
Python 2.7.9 (default, Apr 2 2015, 15:33:21) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) 709.78271289338397
Mac OS X
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42) [GCC 4.2.1 (Apple Inc. build 5577)] on darwin Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) 709.78271289338397
The argument to log1p is getting close to the largest double precision number:
>>> sys.float_info.max 1.7976931348623157e+308
I’ll trust the 709.xxxx answers. I tried an alternative calculation using the Decimal module in python:
>>> from decimal import *
>>> getcontext().prec = 400
>>> (Decimal(1.7976931348622732e+308)+Decimal(1)).ln()
The answer was: Decimal(‘709.7827128933839730844729653945406010028483308408967416882624245425182373438217354226633292092126001554896893194841985716513198090270719058674256471413044766448449620789865561439964561173965004395982567690334278537771153659333034417666750701631982343807434305084955327863646409139369174559366942984521874728821971739604744878331189497294407403384339452268952762722965577809095518932592274010897959168’)
Using base 10 logarithms and ignoring the “1p” allowed me to simplify the equation to something that doesn’t challenge the floating point limits:
>>> (308+np.log10(1.7976931348622732)) / np.log10(np.e)
Giving an answer of 709.78271289338397, which is exactly what you got in Linux and Mac OS X.
An alternative implementation, scipy.special.log1p, gives 709.78271289338397 on both Windows and Linux.
On Windows MSVC compiler maps long double to double, while *nix supports 128bit float:
http://stackoverflow.com/a/5022356/2230844
I don’t have a Windows machine, but my guess is that numpy on Windows falls back to using the naive implementation in https://github.com/numpy/numpy/blob/master/numpy/core/src/npymath/npy_math.c.src#L98, which computes log1p(x) as log(1 + x) times some compensated summation factors. The first multiplication pushes the result above the floating point cutoff, and it’s all infs from then on. On *nix, npy_log1p probably just uses log1p from the C standard library, which works fine.
To verify this, I created a small C program that links in both implementations (see https://gist.github.com/jvkersch/686051b6f7331c44add9). The native implementation in numpy (on my Macbook Pro) indeed prints out inf, while the other code path returns the 709… result
As discussed with you this morning, the problem is that on Windows numpy is using its own fallback. Presumably Windows doesn’t provide a system log1p, but Mac and Linux typically do.
numpy’s log1p is here:
https://github.com/numpy/numpy/blob/467d4e16d77a2e7c131aac53c639e82b754578c7/numpy/core/src/npymath/npy_math.c.src#L98
(for arguments “close” to the floating point maximum, `log(1+x) * x` overflows.)
As x gets large log(x) and log1p(x) get very close; the difference between them grows very small. Eventually both log(x) and log1p(x) are approximated by the same floating point number. This seems to happen at about 2**49.
observation: Since Python 2.6 python itself has a math.log1p function. So numpy could fall back to using that.
We could change numpy’s fallback so that it has an additional case for large x:
if(x > 0x1p49) {
return npy_log(x);
}
Oh yukh, I suppose the actual threshold for when log(x) and log1p(x) are the same depends on the floating point precision. But that’s numpy’s problem.
@Will Furnass
Thanks, Will. I think that using the scipy implementation will help with our portability! Cheers!
Hi,
Any reason why you would want to use log1p for such large arguments? Typically, you use log1p for problems where you want to calculate the logarithm for a number which is very close (IEEE double precision close) to 1.0.
I understand the compatibility argument, though. But you’re really way beyond the applicability range of log1p. As David Jones says, log(x) and log1p(x) for very large numbers will be the same (read: floating point-wise the same).
Hi Gert
I agree…it’s an odd thing to do.
I have no idea why log1p is being used to be honest. I was simply giving the project’s test suite a shake-down and noticed this issue on Windows.
Cheers,
Mike