## Archive for June, 2014

Something that became clear from my recent comparison of Numpy’s Mersenne Twister implementation with MATLAB’s is that there is something funky going on with seed 0 in MATLAB. A discussion in the comments thread helped uncover what was going on. In short, seed 0 gives exactly the same random numbers as seed 5489 in MATLAB (unless you use their deprecated rand(‘twister’,0) syntax).

This is a potential problem for anyone who performs lots of simulations that make use of random numbers such as monte-carlo simulations. One common work-flow is to run the same program hundreds of times where only the seed differs between runs. This is probably good enough to ensure that each simulation uses a random number stream that is statistically independent from all of the others — There is a risk that some streams will overlap but the probability is low and most people are content to live with that risk.

The practical upshot of this is that if you intend on sticking with Mersenne Twister for your MATLAB monte-carlo simulations, it might be wise to avoid seed 0. Alternatively, move to a random number generator that guarantees non-overlapping, independent streams – something that any implementation of Mersenne Twister cannot do.

Here’s a demo run in MATLAB 2014a on Windows 7.

>> format long >> rng(0) >> rand(1,5)' ans = 0.814723686393179 0.905791937075619 0.126986816293506 0.913375856139019 0.632359246225410 >> rng(5489) >> rand(1,5)' ans = 0.814723686393179 0.905791937075619 0.126986816293506 0.913375856139019 0.632359246225410

When porting code between MATLAB and Python, it is sometimes useful to produce the exact same set of random numbers for testing purposes. Both Python and MATLAB currently use the Mersenne Twister generator by default so one assumes this should be easy…and it is…provided you use the generator in Numpy and avoid the seed 0.

**Generate some random numbers in MATLAB**

Here, we generate the first 5 numbers for 3 different seeds in MATLAB. Our aim is to reproduce these in Python.

>> format long >> rng(0) >> rand(1,5)' ans = 0.814723686393179 0.905791937075619 0.126986816293506 0.913375856139019 0.632359246225410 >> rng(1) >> rand(1,5)' ans = 0.417022004702574 0.720324493442158 0.000114374817345 0.302332572631840 0.146755890817113 >> rng(2) >> rand(1,5)' ans = 0.435994902142004 0.025926231827891 0.549662477878709 0.435322392618277 0.420367802087489

**Python’s default random module**

According to the documentation,Python’s random module uses the Mersenne Twister algorithm but the implementation seems to be different from MATLAB’s since the results are different. Here’s the output from a fresh ipython session:

In [1]: import random In [2]: random.seed(0) In [3]: [random.random() for _ in range(5)] Out[3]: [0.8444218515250481, 0.7579544029403025, 0.420571580830845, 0.25891675029296335, 0.5112747213686085] In [4]: random.seed(1) In [5]: [random.random() for _ in range(5)] Out[5]: [0.13436424411240122, 0.8474337369372327, 0.763774618976614, 0.2550690257394217, 0.49543508709194095] In [6]: random.seed(2) In [7]: [random.random() for _ in range(5)] Out[7]: [0.9560342718892494, 0.9478274870593494, 0.05655136772680869, 0.08487199515892163, 0.8354988781294496]

**The Numpy random module**

Numpy’s random module, on the other hand, seems to use an identical implementation to MATLAB for seeds other than 0. In the below, notice that for seeds 1 and 2, the results are identical to MATLAB’s. For a seed of zero, they are different.

In [1]: import numpy as np In [2]: np.set_printoptions(suppress=True) In [3]: np.set_printoptions(precision=15) In [4]: np.random.seed(0) In [5]: np.random.random((5,1)) Out[5]: array([[ 0.548813503927325], [ 0.715189366372419], [ 0.602763376071644], [ 0.544883182996897], [ 0.423654799338905]]) In [6]: np.random.seed(1) In [7]: np.random.random((5,1)) Out[7]: array([[ 0.417022004702574], [ 0.720324493442158], [ 0.000114374817345], [ 0.30233257263184 ], [ 0.146755890817113]]) In [8]: np.random.seed(2) In [9]: np.random.random((5,1)) Out[9]: array([[ 0.435994902142004], [ 0.025926231827891], [ 0.549662477878709], [ 0.435322392618277], [ 0.420367802087489]])

**Checking a lot more seeds**

Although the above interactive experiments look convincing, I wanted to check a few more seeds. All seeds from 0 to 1 million would be a good start so I wrote a MATLAB script that generated 10 random numbers for each seed from 0 to 1 million and saved the results as a .mat file.

A subsequent Python script loads the .mat file and ensures that numpy generates the same set of numbers for each seed. It outputs every seed for which Python and MATLAB differ.

On my mac, I opened a bash prompt and ran the two scripts as follows

matlab -nodisplay -nodesktop -r "generate_matlab_randoms" python python_randoms.py

The output was

MATLAB file contains 1000001 seeds and 10 samples per seed Random numbers for seed 0 differ between MATLAB and Numpy

**System details**

- Late 2013 Macbook Air
- MATLAB 2014a
- Python 2.7.7
- Numpy 1.8.1

Here in the UK, this morning’s news is dominated by the Gameover Zeus virus and how it can hold you to ransom, empty your bank accounts and generally ruin your day!

The usual good advice on how to protect yourself from such attacks is doing the rounds but I wondered how effective one **extra** precaution might be: **Only ever log into bank accounts etc using a dedicated device**.

I’m seriously considering doing this since internet-capable devices are very cheap these days. While I’m at it, I’m thinking of taking the following **extra** precautions:

- Install Linux on the dedicated device since it is not targeted by hackers as often as Windows-based devices are.
- Create dedicated email addresses for each bank account. That way, if my normal email account were compromised, my bank accounts would still be safe.

Obviously, such a scheme would be less convenient than using whichever of my current devices I happen to be using but I’d rather that than be robbed of everything.

What do you think? Would such a scheme offer any additional protection?