MD5 hashing problem

November 6th, 2007 | Categories: Linux, mathematica | Tags:

A user on comp.soft-sys.math.mathematica had a query about MD5 hashes in Mathematica that caught my attention recently. Now I was playing with MD5 in php a few days ago and one thing that I discovered was that the MD5 of a string seemed to vary depending on which program you used to generate it. For example if we use the unix command md5sum to hash the string ‘hello’ (Note the quotes are not part of the string) as follows

echo ‘hello’ | md5sum

we will get

b1946ac92492d2347c6235b4d2611184

All well and good but if we use the php md5 function to hash ‘hello’ (using the script here for example) then we get

5d41402abc4b2a76b9719d911017c592

Clearly different which was enough to annoy at least one person. It turns out that the reason for this is quite straightforward. The php function is returning the hash of the string ‘hello’ as required but the standard unix example is returning the hash of the string ‘hello\n’ where \n stands for a newline. Initially I thought this was interesting but then it hit me that the output of

echo ‘hello’

is in fact ‘hello\n’ so no one should have been surprised really. I would have quickly forgotten about this but someone was having a similar problem in Mathematica. In Mathematica strings are enclosed in double quotes so we hash the word hello as follows:

Hash[“hello”, “MD5”] // BaseForm[#, 16] &

5deaee1c1332199e5b5bc7c5e4f7f0c2

Which is completely different from our two cases above so what on earth is going on? Again, it turns out that the solution is, in fact, rather dull. It seems that Mathematica includes the enclosing double quotes when it produces the hash – which is not what I would expect at all. You can confirm this by running the string (including quotes) “hello” through the php md5 function.

I know its not exactly earth shattering stuff but I thought that I would write it up just in case someone else wondered about this stuff and was googling for it.

  1. March 1st, 2008 at 04:03
    Reply | Quote | #1

    Mathematica does have good reason to include the quotes. It needs to make sure that the symbol foo and the string “foo” hash to different things. (Only equivalent mathematica expressions should hash to the same thing.) Anyway, you link to a workaround for generating hashes of the string contents in mathematica. Really appreciate it.

  2. Robert Wallis
    October 30th, 2010 at 15:38
    Reply | Quote | #2

    Thank you, I was just about to throw Mathematica out the window.