Reproducing MATLAB random numbers in Python
When porting code between MATLAB and Python, it is sometimes useful to produce the exact same set of random numbers for testing purposes. Both Python and MATLAB currently use the Mersenne Twister generator by default so one assumes this should be easy…and it is…provided you use the generator in Numpy and avoid the seed 0.
Generate some random numbers in MATLAB
Here, we generate the first 5 numbers for 3 different seeds in MATLAB. Our aim is to reproduce these in Python.
>> format long >> rng(0) >> rand(1,5)' ans = 0.814723686393179 0.905791937075619 0.126986816293506 0.913375856139019 0.632359246225410 >> rng(1) >> rand(1,5)' ans = 0.417022004702574 0.720324493442158 0.000114374817345 0.302332572631840 0.146755890817113 >> rng(2) >> rand(1,5)' ans = 0.435994902142004 0.025926231827891 0.549662477878709 0.435322392618277 0.420367802087489
Python’s default random module
According to the documentation,Python’s random module uses the Mersenne Twister algorithm but the implementation seems to be different from MATLAB’s since the results are different. Here’s the output from a fresh ipython session:
In [1]: import random In [2]: random.seed(0) In [3]: [random.random() for _ in range(5)] Out[3]: [0.8444218515250481, 0.7579544029403025, 0.420571580830845, 0.25891675029296335, 0.5112747213686085] In [4]: random.seed(1) In [5]: [random.random() for _ in range(5)] Out[5]: [0.13436424411240122, 0.8474337369372327, 0.763774618976614, 0.2550690257394217, 0.49543508709194095] In [6]: random.seed(2) In [7]: [random.random() for _ in range(5)] Out[7]: [0.9560342718892494, 0.9478274870593494, 0.05655136772680869, 0.08487199515892163, 0.8354988781294496]
The Numpy random module
Numpy’s random module, on the other hand, seems to use an identical implementation to MATLAB for seeds other than 0. In the below, notice that for seeds 1 and 2, the results are identical to MATLAB’s. For a seed of zero, they are different.
In [1]: import numpy as np In [2]: np.set_printoptions(suppress=True) In [3]: np.set_printoptions(precision=15) In [4]: np.random.seed(0) In [5]: np.random.random((5,1)) Out[5]: array([[ 0.548813503927325], [ 0.715189366372419], [ 0.602763376071644], [ 0.544883182996897], [ 0.423654799338905]]) In [6]: np.random.seed(1) In [7]: np.random.random((5,1)) Out[7]: array([[ 0.417022004702574], [ 0.720324493442158], [ 0.000114374817345], [ 0.30233257263184 ], [ 0.146755890817113]]) In [8]: np.random.seed(2) In [9]: np.random.random((5,1)) Out[9]: array([[ 0.435994902142004], [ 0.025926231827891], [ 0.549662477878709], [ 0.435322392618277], [ 0.420367802087489]])
Checking a lot more seeds
Although the above interactive experiments look convincing, I wanted to check a few more seeds. All seeds from 0 to 1 million would be a good start so I wrote a MATLAB script that generated 10 random numbers for each seed from 0 to 1 million and saved the results as a .mat file.
A subsequent Python script loads the .mat file and ensures that numpy generates the same set of numbers for each seed. It outputs every seed for which Python and MATLAB differ.
On my mac, I opened a bash prompt and ran the two scripts as follows
matlab -nodisplay -nodesktop -r "generate_matlab_randoms" python python_randoms.py
The output was
MATLAB file contains 1000001 seeds and 10 samples per seed Random numbers for seed 0 differ between MATLAB and Numpy
System details
- Late 2013 Macbook Air
- MATLAB 2014a
- Python 2.7.7
- Numpy 1.8.1
What is special about seed 0? I remember reading 3rd-party code that considered seed 0 a special case. Does anyone have an explanation or reference?
Matlab and Numpy has already equal random numbers.
In Matlab
>> rand(‘twister’,0)
>> rand
ans =
0.548813503927325
In Numpy
>>> import numpy
>>> numpy.random.seed(0)
>>> numpy.random.random()
0.5488135039273248
I wonder if it is also worth checking that if you pick one seed and then generate a million random numbers that they are the same.
@markuman – That syntax is discouraged by Mathworks.
http://www.mathworks.co.uk/help/matlab/math/updating-your-random-number-generator-syntax.html
Instead of
rand(‘twister’,seed)
They suggest that we should use
rng(seed)
The document linked to above gives us the key as to why a seed of zero is special in MATLAB. Mathworks have chosen a seed of zero to be ‘default’ and it seems that they have chosen a seed of 5489 to be this default and so, in MATLAB, a seed of 0 is equivalent to 5489….at least if you use the rng(seed) syntax”
>> rng default
>> rand(1,5)’
ans =
0.8147
0.9058
0.1270
0.9134
0.6324
>> rng(0)
>> rand(1,5)’
ans =
0.8147
0.9058
0.1270
0.9134
0.6324
>>rng(5489)
>> rand(1,5)’
ans =
0.8147
0.9058
0.1270
0.9134
0.6324
Thanks for the questions, they helped me realise why 0 is special :)
In the reference implementation (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html), if init_genrand() had not been called before to initialize the state vector, it is then initialized with a hardcoded seed value of 5489 upon the first call to genrand_int32().
Now MATLAB interprets rng(0) as a special case to initializing the state vector with the default hardcoded value 5489 that the original authors used. This is how the RNG is initialized when MATLAB starts.
NumPy on the other hand does not have a fixed default value for the seed (the default is “None” which tells it to get a random seed from /dev/urandom or some equivalent). So a seed value of 0 is literally interpreted as such, and used as-is to initialize the internal state vector.
Reference:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/randomkit.c#L138
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L640
In the old MATLAB syntax of rand(‘twister’,0), apparently the seed value of zero is interpreted as is as well. You can verify this fact by inspecting the state vector:
>> rand(‘twister’,0) % old style
>> r=rng;
>> r.State{4} % r.State is a 1×6 cell array, contains both integers and floats (?)
versus:
>>> np.random.seed(0)
>>> np.random.get_state()[1]
Thanks for the follow up, Amro. Greatly appreciated.